OmkarPathak / ResumeParser

A simple resume parser used for extracting information from resumes
MIT License
284 stars 164 forks source link

Different results when running parser directly #31

Closed dickyj closed 4 years ago

dickyj commented 4 years ago

Omkar, Hi, great script here. I did run into a mysterious bug.

I tried parsing other PDF files, using pyresparser in command line, and running it directly under the https://github.com/OmkarPathak/pyresparser project, but I am getting different results strangely, I have not been able to figure out why. For example, In this project, when I run: pyresparser -f newResume.pdf, it correctly parse the mobile no: 'mobile_number': '+6225########', but when I run it using this command, under the https://github.com/OmkarPathak/pyresparser directly using: python command_line.py -f newResume.pdf, I instead get the year as the mobile_number, e.g. 'mobile_number': '001 1995'

It somehow gets confused with the year instead of the mobile. Initially I ran it under different OS, I though it was OS related but apparently its not. Can you point me to what may have gone wrong? Thanks.

OmkarPathak commented 4 years ago

@dickyj thats atrange. Can you please provide your resume sample so I can test it in my env

OmkarPathak commented 4 years ago

One possible reason for this would be that running command_line.py would invoke the latest code and I have changed the regex for extraction of mobile numbers for upcoming release. This is the latest regex while this is the old one

dickyj commented 4 years ago

@dickyj thats atrange. Can you please provide your resume sample so I can test it in my env

Omkar, thanks for the reply, I want to give you the sample,but due to privacy concerns, how can I send it directly to you? The parsing of the name is also wrong, was the name parsing changed as well?

OmkarPathak commented 4 years ago

You can email me the same

dickyj commented 4 years ago

You can email me the same

Ok sent.

OmkarPathak commented 4 years ago

Hi @dickyj, I tried parsing the resume you sent. It (pyresparser) parses the mobile numbers correctly. However, you are right that Name is not getting parsed correctly. Looks that we need to train the model more rigorously. For time being you can try and parse some other resume formats

dickyj commented 4 years ago

@OmkarPathak hi, what version of pyresparser did you use? When I get the master, it does not seem to parse the mobile number correctly. I am getting this: 'mobile_number': '001 1995'

OmkarPathak commented 4 years ago

Version is still 1.0.5, I'm planning to release next version soon. For time being you can run the pyresparser as: python command_line.py -f <resume_file> -re <custom_regex_to_parse_mobile_number>

OmkarPathak commented 4 years ago

For your use case you can use following regex: (\d{5}[-\.\s]??\d{3}[-\.\s]??\d{4}|\(\d{3}\)[-\.\s]*\d{3}[-\.\s]??\d{4}|\d{3}[-\.\s]??\d{4})

dickyj commented 4 years ago

I believe thats the same regex that I got when I git clone it. Anyway, I tried running with it, and I got the same erroneous results. Should it be the previous version?

OmkarPathak commented 4 years ago

Yes that might be the version error. Try uninstalling the previous version and running command_line.py without installing pyresparser (i.e run from source) as new version is not yet released on pypi

OmkarPathak commented 4 years ago

@dickyj closing this due to inactivity