gijswobben / pymed

PyMed is a Python library that provides access to PubMed.
MIT License
193 stars 111 forks source link

Cherry branch #25

Open mbullmanFHCRC opened 5 years ago

mbullmanFHCRC commented 5 years ago

All Submissions:

New Feature Submissions:

  1. [ ] Does your submission pass tests (if applicable)?
  2. [ x] Have you lint your code locally prior to submission (use flake8)?

Changes to Core Features:

I only want to commit 27aeba4 / 989a237, I've tried cherry-picking the changes but can't seem to get it to work.

Basically, when you're using getContent / xml_element.findall(path), if you run into italics or bolds in the xml text you run into issues, since the xml parsers doesn't handle those edge cases correctly.

To fix this I'm removing , , , from the response string that _get returns before xml.fromstring(response) is called.

It seems to work on the quick test I did on my own system, I think there are other special characters which could be added to the regex if needed, but for now this works for my purposes. I saw issue #23 referenced incomplete abstract which was a symptom I saw on my end. The issue also arises in ArticleTitle if italics or bolds are used.

I didn't test this too extensively, hopefully it's not breaking anything else down the line.

Sorry for the extra commits! I'll try and figure out how to cherry-pick better in the future.

mbullmanFHCRC commented 5 years ago

accidently deleted branch / PR