Tencent / NeuralNLP-NeuralClassifier

An Open-source Neural Hierarchical Multi-label Text Classification Toolkit
Other
1.83k stars 402 forks source link

How to solve the problem that topK's K is different for every input text? #38

Closed guotong1988 closed 4 years ago

guotong1988 commented 4 years ago

The output topK's K is fixed now.

Do you think training a classifier to predict the value of K for every input is a good solution?

Thank you very much.

coderbyr commented 4 years ago

every

In fact, topK only limits how many possible categories should be recalled, if you model is well trained, it should predict exact or approximate num of categories with ground truth, so, you can just set a number(for example 20.) that slightly bigger than maximum number of categories of each instance.

guotong1988 commented 4 years ago

谢谢!还是不太懂, 我的意思是一般不是结果取TopK么,K对每个输入是固定的, 预测阶段,如果每个输入的标签数有多有少呢? @coderbyr

coderbyr commented 4 years ago

谢谢!还是不太懂, 我的意思是一般不是结果取TopK么,K对每个输入是固定的, 预测阶段,如果每个输入的标签数有多有少呢? @coderbyr

这里K主要限制样本最多可能的标签个数,一般K可以取比较大的数(大于训练样本中单条样本的标签个数),正常情况下,模型泛化性能不会很差,每个样本的标签数不会超过K