ContentMine / getpapers

Get metadata, fulltexts or fulltext URLs of papers matching a search query
MIT License
197 stars 37 forks source link

eupmc: allow filter by article type e.g. patents #33

Closed rossmounce closed 9 years ago

rossmounce commented 9 years ago

getpapers -q 'Gasteria' --api eupmc -a -s -l verbose --outdir ./blah

above search outputs one curious (but technically correct!) folder called: Gasteria plant named 'WT10' with supp data inside (correct expected behaviour).

It turns out it's a patent: http://europepmc.org/patents/PAT/US2012102612P

I would like to be able to filter-out patents AND/OR conversely, search only patents via EUPMC via getpapers.

blahah commented 9 years ago

you can do this using the Europe PMC API query language, using the PUB_TYPE token:

--query PUB_TYPE:"research article"

However, there isn't a 'patent' article type in EPMC, so I don't know where that's coming from. The full list of article types can be found in the dropdown on EPMC advanced search: http://europepmc.org/advancesearch

blahah commented 9 years ago

Scrap that, it's actually the SRC (as in 'data source') field that includes patents.

--query "SRC:PAT" # search patents
--query "NOT SRC:PAT" # search everything except patents

See the EPMC reference documentation for more details (Appendix 1 has the query syntax, and section 3 lists data sources)

rossmounce commented 9 years ago

Brill. Thanks!