Incomplete abstract - Githubissues

gijswobben / pymed

PyMed is a Python library that provides access to PubMed.

MIT License

193 stars 111 forks source link

Incomplete abstract #23

Open nleguillarme opened 5 years ago

nleguillarme commented 5 years ago

While iterating on articles resulting from a PubMed query, I also noticed that the abstract is sometimes incomplete :

For instance : Query : ((Haliaeetus leucocephalus[Title/Abstract])) AND ((prey[Title/Abstract]) OR (diet[Title/Abstract]))

Returns (when printing first 10 results) : pubmed_id = '31015971' abstract = 'Bald eagle ('

Keramatfar commented 5 years ago

The problem is that the package get the text of abstracttext tag and in the example there is some html tag there. see here: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&id=31015971

Keramatfar commented 5 years ago

I changed the line 156 in api.py to: response = re.sub('<[/ ]*[a-z]{1,3}>', '', str(response.text)) return response

iacopy commented 4 years ago

Many abstracts and titles are truncated.

iacopy commented 4 years ago

I changed the line 156 in api.py to: response = re.sub('<[/ ]*[a-z]{1,3}>', '', str(response.text)) return response

It seems a useful solution for most of articles. But not good for math articles with `<mml:math ....> tags. Anyway I suggest to merge this one to have a significative fix so far.

vectorkt commented 4 years ago

@iacopy

This issue still occurs. I installed pymed through pip as suggested here:

https://pypi.org/project/pymed/

Is the pip package up to date? Should I clone the git directly instead?

Or is this issue not fixed overall?

iacopy commented 4 years ago

@vectorkt yes, the the pip package is not updated, since these merge requests are not merged, so the issue still occurs. This repo seems currently abandoned. If you want some fixes (correct PMIDs, non-truncated texts, only-english abstracts, ...) you can use my fork branch fork-fixes. You can try in your virtualenv pip install -e git://github.com/iacopy/pymed.git@fork-fixes#egg=pymed, preceded by pip install requests if needed. I'm actually using this. Let me know.