Franck-Dernoncourt / NeuroNER

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.
http://neuroner.com
MIT License
1.69k stars 476 forks source link

Dependency of pretrained model on provided "test" set #128

Open StanislavPy opened 5 years ago

StanislavPy commented 5 years ago

Hello. I would like to thank you for a great model, although I found some strange behaviour of it.

The situation is as following:

  1. I trained model on my own small dataset , and it showed nice weighted f1-score on validation set
  2. I used a prepare_pretrained_model and chose epoch which gave me the highest weighted f1-score on validation set
  3. After that, in order to use a prediction mode of NNER, I need to initialize model with pretrained_model_folder='mymodel', use_pretrained_model=True and also it's obligatory to provide dataset_text_folder with deploy or test sets.

After that I found some problem.
The thing is that I want to use NNER in production as a service, that's why it is crucial to keep model loaded in memory all the time and just use 'nner.predict()' on the new text that's coming to the service. Although I found that prediction may highly vary depending on which data you provide with parameter dataset_text_folder. When I provide train_set as a 'test_set' the results are fine, although, when I use valid_set as a 'test_set' the results are different and much worse.

What could be the reason of such behavior?