What if the dev data is not completely labeled?

Mobius-Ash commented 3 years ago

Hi Allanj! Nice to see an article of incomplete annotations for NER. But I still have some questions. In your code, dev data is used to evaluate each fold model, without entities removal. Is it means that we need a high-quality dev data? But what if the dev data is also not completely labeled? Can your code work well in this case? Hopes to see your ideas. Thanks!

allanj commented 3 years ago

I did think about this question before. We decided to make the dev data clean for a better evaluation and demonstration in our results.

If the validation/development data is not completely labeled, I think when the model is getting truly better during the process, the development results could be worse. For example, we might see high recall, low precision in validation during the training process because the validation set has just a few entities but our model will try to predict more in this corrupted validation set.

So, I guess the training process will select a bad model such that my code may not work well. Seems it is really hard to evaluate if we have incomplete validation data as we don't know how well the model is during the training process.

But I guess this might be a use case in your scenario? Honestly, I don't really have a good picture if the validation set is not completely labeled. But would love to discuss more on the possibilities to improve the model.

Mobius-Ash commented 3 years ago

Thanks for your reply. In fact, just when I wanted to use your method, I found desperately that I did not have a completely labeled validation data. So I create this issue.

If you are interested in it, we can discuss my immature idea. But I am not sure if it works. Follow your method, how about we overfit the fold data? For example, we split the train data into fold A and fold B. (In this time we combine the train data and dev data as the whole train data.) We overfit model A on fold A. Then we use model A to predict fold B. We can believe that the predictions are high-quality in the view of model A, and we add the different part to fold B. So that fold B has its ground trues and predictions from model A. We take both of them as new ground trues. We use the same method as fold A on fold B. After that, we re-overfit model A and model B. To iterate it some times, or stop when there are few differences between predictions and ground trues, then we might think that fold A and fold B are completely labeled.

I'm not sure if it is theoretically feasible. Hopes for your advice. (Sorry for my poor english expression = =)

allanj commented 3 years ago

That‘s an interesting problem. One comment I have is probably you have to impose some punishment/regularization during the training of model A to avoid completely overfitting. Theoretically, it is not sure at what point it will converge. Maybe it could converge to a bad or good model. It also depends on the initialization as well. Probably initializing the unknown labels to O should be the best way.

Mobius-Ash commented 3 years ago

Thanks for your advice! Maybe I have to do some experiments on it. :)

allanj / ner_incomplete_annotation

What if the dev data is not completely labeled? #7