Text tokenizer is classifying the letter "O" as punctuation.

I am using the following code:

doc = nlp(text)
for token in doc:
    if token.pos_ == 'PUNCT':
        text = text.replace(token.text, '')

with the following raw text, read from a PDF using pyPDF

"with a proven track record of delivering strategic financial solutions for clients. Highly accomplished"

it is being converted to

"with a prven track recrd f delivering strategic financial slutins fr clients Highly accmplished"

I noted this behavior to the creator of the package I am using Resume Matcher, but I can keep the letter "O" in the output using this workaround:

doc = nlp(text)
for token in doc:
    if token.pos_ == 'PUNCT' and token.text != 'o':
        text = text.replace(token.text, '')

There may be an issue as to how the text is being read in from pyPDF, but looking at the results when using the pyPDF function, the text looks correct.

Info about spaCy

Python 3.9.0 Windows 10

spaCy version: 3.6.0
Platform: Windows-10-10.0.19041-SP0
Python version: 3.9.0
Pipelines: en_core_web_md (3.6.0), en_core_web_sm (3.6.0)

explosion / spaCy

Text tokenizer is classifying the letter "O" as punctuation. #13221

Info about spaCy