Closed puttapraneeth closed 3 years ago
That's weird. Are you sure you used python resume_parser.py
command only? Were there any other parameters passed while executing the mentioned command?
Exactly same. Did not add any parameters
Nothing is getting executed after this line custom_nlp = spacy.load(os.path.dirname(os.path.abspath(file)))
Tried with both pdf and docx file types which are working with pyresparser, but not here
Using windows 10. They have explained this issue here
So should I train it again on windows 10 and try as explained here
@puttapraneeth I have already tested the same on Windows. It has no such problems. No need to re-train it on windows.
Can you please paste here the code you are using in __main__
in resume_parser.py
file
Definitely.
if name == 'main':
pool = mp.Pool(mp.cpu_count())
resumes = []
data = []
for root, directories, filenames in os.walk('resumes/'):
for filename in filenames:
file = os.path.join(root, filename)
resumes.append(file)
results = [
pool.apply_async(
resume_result_wrapper,
args=(x,)
) for x in resumes
]
results = [p.get() for p in results]
pprint.pprint(results)
error
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\multiprocessing\pool.py", line 121, in worker
result = (True, func(*args, kwds))
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\pyresparser\pyresparser\resume_parser.py", line 131, in resume_result_wrapper
parser = ResumeParser(resume)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\pyresparser\pyresparser\resume_parser.py", line 20, in init
custom_nlp = spacy.load(os.path.dirname(os.path.abspath(file)))
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy__init__.py", line 27, in load
return util.load_model(name, overrides)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\util.py", line 133, in load_model
return load_model_from_path(Path(name), overrides)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\util.py", line 173, in load_model_from_path
return nlp.from_disk(model_path)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\language.py", line 791, in from_disk
util.from_disk(path, deserializers, exclude)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\util.py", line 630, in from_disk
reader(path / key)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\language.py", line 781, in
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "resume_parser.py", line 156, in
Tried this as well which didn't work
if name == 'main':
data = ResumeParser('OmkarResume.pdf').get_extracted_data()
print(data)
command - python resume_parser.py
Error:
Traceback (most recent call last):
File "resume_parser.py", line 138, in
I see. in this case we need to try re-training
Am getting the below issue when I started with training
ValueError: [E103] Trying to set conflicting doc.ents: '(1155, 1199, 'Email Address')' and '(1143, 1240, 'Links')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap.
How to resolve this overlap issue?
First time it failed at
assert nlp2.get_pipe("ner").move_names == move_names
'O' is present initially in move names of new model but not in original moves names. So the mismatch and failed.
When ran again getting the overlap issue. Searched about overlap issue but I didn't understand. This might be a silly issue but I am a newbie, could you help me resolving this issue.
Searched about overlap issue but I didn't understand
Thanks.
@puttapraneeth not sure about this error. Can you try raising this issue on Spacy's issue tracker please?
Raised this with Spacy's issue tracker, didn't receive any response from them. Hence closing this one
Thanks, Praneeth
Hi @puttapraneeth,
Did you get the solution to this error?
First time it failed at
assert nlp2.get_pipe("ner").move_names == move_names
'O' is present initially in move names of new model but not in original moves names. So the mismatch and failed.
When ran again getting the overlap issue. Searched about overlap issue but I didn't understand. This might be a silly issue but I am a newbie, could you help me resolving this issue.
Searched about overlap issue but I didn't understand
Thanks.
Hi @puttapraneeth, Can u please paste the corrected moves file content here?
Or
Can u please just specify can I resolve this error- assert nlp2.get_pipe("ner").move_names == move_names
This is absolutely great.
Using pyresparser package I am able to extract the fields from a resume. To check the implementation downloaded code and did the setup as mentioned. When executed with the same resume it ended with error, details are below. Resume used for this doesn't contain any images and it is working with pyresparser.
Command: python resume_parser.py
Traceback (most recent call last): File "resume_parser.py", line 133, in
data = ResumeParser('OmkarResume.pdf').get_extracted_data()
File "resume_parser.py", line 20, in init
custom_nlp = spacy.load(os.path.dirname(os.path.abspath(file)))
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy__init.py", line 27, in load
return util.load_model(name, overrides)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\util.py", line 133, in load_model
return load_model_from_path(Path(name), overrides)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\util.py", line 173, in load_model_from_path
return nlp.from_disk(model_path)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\language.py", line 791, in from_disk
util.from_disk(path, deserializers, exclude)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\util.py", line 630, in from_disk
reader(path / key)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\language.py", line 781, in
deserializers["tokenizer"] = lambda p: self.tokenizer.from_disk(p, exclude=["vocab"])
File "tokenizer.pyx", line 391, in spacy.tokenizer.Tokenizer.from_disk
File "tokenizer.pyx", line 432, in spacy.tokenizer.Tokenizer.from_bytes
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\spacy\util.py", line 606, in from_bytes
msg = srsly.msgpack_loads(bytes_data)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\srsly_msgpack_api.py", line 29, in msgpack_loads
msg = msgpack.loads(data, raw=False, use_list=use_list)
File "C:\Users\Praneeth\Anaconda3\envs\pyparser\lib\site-packages\srsly\msgpack\ init__.py", line 60, in unpackb
return _unpackb(packed, **kwargs)
File "_unpacker.pyx", line 191, in srsly.msgpack._unpacker.unpackb
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 0: invalid continuation byte
Unable to understand why it is failing. Need your help in resolving this.
Thanks, Praneeth