OmkarPathak / pyresparser

A simple resume parser used for extracting information from resumes
GNU General Public License v3.0
774 stars 395 forks source link

Significant struggles with name identification #31

Open BenSturgeon opened 4 years ago

BenSturgeon commented 4 years ago

Thank you very much for the work you've done on this.

While the results of this are currently fairly good I've noticed names are a big struggle. I even ran your resume as a sample through the system and it returned "www.omkarpathak.in" for that field.

Do you think adding negative patterns for it to check against is the smartest short term solution for this problem? Otherwise do you think more training is required on the part of the NLP model regarding names?

If you need access to more data I have access to a large amount of CVs which I'd be happy to share.

Thanks again for your continued work on this project.

OmkarPathak commented 4 years ago

@BenSturgeon yes. We need a large dataset of resumes to train model to produce more robust results. If you can share the CVs it would be really helpful 😄

BenSturgeon commented 4 years ago

@OmkarPathak Awesome, I'll send you an email with the google drive link containing a large amount of CVs.

I'd be happy to contribute by helping with labeling as well if you'd be interested in sharing the process with me.

OmkarPathak commented 4 years ago

@BenSturgeon would be happy to share.

aditya-malte commented 3 years ago

Hi, So do we now have a large dataset? Would be great if it was open-source