Open nleguillarme opened 5 years ago
The problem is that the package get the text of abstracttext tag and in the example there is some html tag there. see here: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=31015971
I changed the line 156 in api.py to: response = re.sub('<[/ ]*[a-z]{1,3}>', '', str(response.text)) return response
Many abstracts and titles are truncated.
I changed the line 156 in api.py to: response = re.sub('<[/ ]*[a-z]{1,3}>', '', str(response.text)) return response
It seems a useful solution for most of articles. But not good for math articles with `<mml:math ....> tags. Anyway I suggest to merge this one to have a significative fix so far.
@iacopy
This issue still occurs. I installed pymed through pip as suggested here:
https://pypi.org/project/pymed/
Is the pip package up to date? Should I clone the git directly instead?
Or is this issue not fixed overall?
@vectorkt yes, the the pip package is not updated, since these merge requests are not merged, so the issue still occurs. This repo seems currently abandoned.
If you want some fixes (correct PMIDs, non-truncated texts, only-english abstracts, ...) you can use my fork branch fork-fixes
. You can try in your virtualenv pip install -e git://github.com/iacopy/pymed.git@fork-fixes#egg=pymed
, preceded by pip install requests
if needed. I'm actually using this.
Let me know.
While iterating on articles resulting from a PubMed query, I also noticed that the abstract is sometimes incomplete :
For instance : Query : ((Haliaeetus leucocephalus[Title/Abstract])) AND ((prey[Title/Abstract]) OR (diet[Title/Abstract]))
Returns (when printing first 10 results) : pubmed_id = '31015971' abstract = 'Bald eagle ('