DataTurks-Engg / Entity-Recognition-In-Resumes-SpaCy

Automatic Summarization of Resumes with NER -> Evaluate resumes at a glance through Named Entity Recognition
https://medium.com/@dataturks/automatic-summarization-of-resumes-with-ner-8b97a5f562b
443 stars 215 forks source link

Saving/Loading Custom Dataset #25

Closed sayalraza closed 4 years ago

sayalraza commented 4 years ago

Hi, I am trying to do inference with the given code. I am getting decent results when testing the code with testdata.json after using nlp.update(). Issue is when i save the model to output_dir with nlp.to_disk() after training the nlp with nlp.update(). When I load the trained model with nlp2.from_disk(output_dir) or nlp2 = spacy.load(output_dir), and then test the model with nlp2, then I am getting very wrong results. Also noticed that the output_dir has number of files and folders in it instead of a single file (like in the case of keras, if we save a model, it is save as a single '.h5' file.). Am I missing out something here? I am relatively new to SpaCy.

sayalraza commented 4 years ago

Resolved. If anyone is getting their model messed up after loading from disk, it is a bug in an older version of spacy. Update your spacy package. If you get the dataset conflict error i.e. , #22 , while training with the new version of spacy, it should raise the error. It is an error in the dataset. I have mannually removed conflicting entities in both traindata.json and testdata.json. Not able to attach json here, though.

Hafsa1992 commented 4 years ago

hi, can you please provide link from where we can download your clean json data files?