doc = nlp(text)
for token in doc:
if token.pos_ == 'PUNCT':
text = text.replace(token.text, '')
with the following raw text, read from a PDF using pyPDF
"with a proven track record of delivering strategic financial solutions for clients. Highly accomplished"
it is being converted to
"with a prven track recrd f delivering strategic financial slutins fr clients Highly accmplished"
I noted this behavior to the creator of the package I am using Resume Matcher, but I can keep the letter "O" in the output using this workaround:
doc = nlp(text)
for token in doc:
if token.pos_ == 'PUNCT' and token.text != 'o':
text = text.replace(token.text, '')
There may be an issue as to how the text is being read in from pyPDF, but looking at the results when using the pyPDF function, the text looks correct.
I am using the following code:
with the following raw text, read from a PDF using pyPDF
it is being converted to
I noted this behavior to the creator of the package I am using Resume Matcher, but I can keep the letter "O" in the output using this workaround:
There may be an issue as to how the text is being read in from pyPDF, but looking at the results when using the pyPDF function, the text looks correct.
Info about spaCy
Python 3.9.0 Windows 10