How do you tune the model to get a large # of keywords outputted by the CRF layer?

LiyuanLucasLiu / LM-LSTM-CRF

Empower Sequence Labeling with Task-Aware Language Model

Apache License 2.0

846 stars 207 forks source link

I read through the paper, and looked through the code in train_wc as well as the arguments that can be passed during initialization. One of the issue I am facing is that after fine-tuning the model on my own dataset, the number of keywords that are outputted varies significantly.

Some texts have no keywords, but still have entities that should be found. Other texts would get between 5 - 10 keywords. I am not trying to tune the maximum number of keywords, because I believe that filtering can be done in post-processing by the confidence scores.

I am interested in knowing if there is a way to tune the minimum number of keywords found, or lower the score threshold so more keywords are found in general.

LiyuanLucasLiu / LM-LSTM-CRF

How do you tune the model to get a large # of keywords outputted by the CRF layer? #69