OmkarPathak / pyresparser

A simple resume parser used for extracting information from resumes
GNU General Public License v3.0
773 stars 394 forks source link

Update to spacy 3.4.x #79

Open ruben-dedoncker opened 1 year ago

ruben-dedoncker commented 1 year ago

Updated the spacy NER model to version 3.4.x

zhuolisam commented 1 year ago

Hi, can you update the requirements.txt as well?

simsong commented 9 months ago

I've been reviewing this. The problem with upgrading to Spacy NER model version 3.4 is that the current resume code seems to have its own model bundled in. Do we know what that model is and what would be required to regenerate it?

ruben-dedoncker commented 9 months ago

I have already updated the requirements.txt as well as updated the bundled model using the available train data. This update works out-of-the-box

IvoLeist commented 3 months ago

@ruben-dedoncker thank you for providing publicly a fix how to update spacy to version 3.4.x I can confirm that your fork runs out of the box :rocket:

Since now some time went past since you have added this PR spacy is now at 3.7.4. I am not (yet) familiar with spacy but I am interested to learn a little bit about it. If I would like to retrain it so the warning below vanishes how much computing power/time would this require?

UserWarning: [W095] Model 'en_pipeline' (0.0.0) was trained with spaCy v3.4.1 and may not be 100% compatible with the current version (3.7.4). If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
  warnings.warn(warn_msg)

@OmkarPathak thank you for making your resume parser open source. Looks like a really interesting project :rocket: