Dependency of pretrained model on provided "test" set

Hello. I would like to thank you for a great model, although I found some strange behaviour of it.

The situation is as following:

I trained model on my own small dataset , and it showed nice weighted f1-score on validation set
I used a prepare_pretrained_model and chose epoch which gave me the highest weighted f1-score on validation set
After that, in order to use a prediction mode of NNER, I need to initialize model with pretrained_model_folder='mymodel', use_pretrained_model=True and also it's obligatory to provide dataset_text_folder with deploy or test sets.

After that I found some problem.
The thing is that I want to use NNER in production as a service, that's why it is crucial to keep model loaded in memory all the time and just use 'nner.predict()' on the new text that's coming to the service. Although I found that prediction may highly vary depending on which data you provide with parameter dataset_text_folder. When I provide train_set as a 'test_set' the results are fine, although, when I use valid_set as a 'test_set' the results are different and much worse.

What could be the reason of such behavior?

Franck-Dernoncourt / NeuroNER

Dependency of pretrained model on provided "test" set #128