ContentMine / getpapers

Get metadata, fulltexts or fulltext URLs of papers matching a search query
MIT License
197 stars 37 forks source link

Query based on PMIDS #147

Closed alexmaina closed 7 years ago

alexmaina commented 7 years ago

I would like to bring your attention about the inability of getpapers to mine content using PMID. For example, when i run a query using pmid

alex@alex-HP-ProDesk-600-G2-SFF:~$ getpapers -q PMID:27355041 -n -o maina
info: Searching using eupmc API
info: Running in no-execute mode, so nothing will be downloaded
error: Malformed or empty response from EuropePMC. Try running again. Perhaps your query is wrong.

When i run a query using PMCID, i get the following results

alex@alex-HP-ProDesk-600-G2-SFF:~$ getpapers -q PMCID:PMC5026053 -n -o maina
info: Searching using eupmc API
info: Running in no-execute mode, so nothing will be downloaded
info: Found 1 open access results

This document has listed PMID as a possible search field. Can getpapers search using PMIDS?

tarrow commented 7 years ago

Thanks for this report

It seems at first glance that this is a problem with EuropePMC (which getpapers depends upon)

You'll see that if you search for PMID:27355041 at europepmc.org it comes up blank. Let me investigate further though.

tarrow commented 7 years ago

It seems that at some point they stopped indexing by PMID and instead used the field EXT_ID (but this can also correspond to, for example, Agricola records)

This seems to work for me: getpapers -q EXT_ID:27355041 -o test

Let me know if you have other problems

alexmaina commented 7 years ago

Thanks @tarrow it works very well but only where the PMID is for a paper that is Open access. Does this mean getpapers cannot mine titles and abstracts that are not open access in Pubmed/medline?

tarrow commented 7 years ago

Sure; you just need to use the -a flag to also get non-open access content. Often there isn't a fulltext available for these though.

alexmaina commented 7 years ago

Thanks again.....Last question and by no means least. I have a dataset of 10 PMIDS

+------------------+
| accession_number |
+------------------+
| 27747646         |
| 27649863         |
| 27621978         |
| 27478298         |
| 27441216         |
| 27397933         |
| 27386033         |
| 27382606         |
| 27379288         |
| 27355041         |
+------------------+

How can i query 10 PMIDS in a single script?

tarrow commented 7 years ago

We don't currently have a way to do this; but we probably should.

The easiest thing would be to simply write a bash script to call getpapers several times. Are you on a unix machine?

tarrow commented 7 years ago

I made a new issue for this: #148

alexmaina commented 7 years ago

Yes I am on a unix machine...is this the same for PMCIDs?