Using the same dataset to train and evaluate the model can't reach to 100% F1 score

possible1402 commented 2 years ago

Hi, I want to make sure that the model architecture works well, so I use the same dataset(golden true dataset without removing entities) to train and evaluate the model. The ideal result is that the model should be overfitted and the F1 score should be 100%. When I use conll dataset, it work well. When I change to my own dataset, the F1 score can only get around 80% not 100%. And I'm kind of confused about the result. The experiment log is in this link: wandb link And I also use train set(18000 samples) to train the model and evaluate on dev set(3000 samples) and the F1 score can reach to 60%. I really hope you can help me out. Thanks!

allanj commented 2 years ago

Thanks. Can you let me know which version of this repo you are using? (PyTorch or DyNet)?

allanj commented 2 years ago

Are you able to overfit your dataset with a normal lstm crf model?

possible1402 commented 1 year ago

Sorry for the late response. I'm using Pytorch version. And actually I figured this out just by running more epochs. And it can reach to roughly 100%. I think this model just need more time to generalize compared to the normal lstm crf model. Thanks for reaching out by the way.

allanj / ner_incomplete_annotation

Using the same dataset to train and evaluate the model can't reach to 100% F1 score #12