ContentMine / getpapers

Get metadata, fulltexts or fulltext URLs of papers matching a search query
MIT License
197 stars 37 forks source link

eupmc: -p stumbles on BMC OA papers (?) #31

Closed rossmounce closed 9 years ago

rossmounce commented 9 years ago

Success: getpapers -q Gasteria --api eupmc -s -l verbose --outdir ./blah

Success: getpapers -q Gasteria --api eupmc -x -l verbose --outdir ./blah

Fail:

$ getpapers -q Gasteria --api eupmc -p   -l verbose  --outdir ./blah
info: Searching using eupmc API
debug: http://www.ebi.ac.uk/europepmc/webservices/rest/search/query=Gasteria%20OPEN_ACCESS%3Ay&resulttype=core
info: Found 13 open access results
Retrieving results [==============================] 100% (eta 0.0s)
info: Done collecting results
info: Saving result metdata
info: Full EUPMC result metadata written to eupmc_results.json
info: Extracting fulltext HTML URL list (may not be available for all articles)
warn: Article with pmcid "PMC3978243" had no fulltext HTML url
warn: Article with pmcid "PMC3605904" had no fulltext HTML url
warn: Article with pmcid "PMC3371435" had no fulltext HTML url
warn: Article with pmcid "PMC3364152" had no fulltext HTML url
warn: Article with pmcid "PMC3066391" had no fulltext HTML url
warn: Article with pmcid "PMC2871514" had no fulltext HTML url
warn: Article with pmcid "PMC2602195" had no fulltext HTML url
info: Fulltext HTML URL list written to fulltext_html_urls.txt
warn: Article with pmcid "PMC3371435" had no fulltext PDF url
info: Downloading fulltext PDF files
debug: Creating directory: PMC4377467/
debug: Downloading PDF: http://europepmc.org/articles/PMC4377467?pdf=render
debug: Creating directory: PMC3978243/
debug: Downloading PDF: http://europepmc.org/articles/PMC3978243?pdf=render
debug: Creating directory: PMC4152747/
debug: Downloading PDF: http://europepmc.org/articles/PMC4152747?pdf=render
debug: Creating directory: PMC3729011/
debug: Downloading PDF: http://europepmc.org/articles/PMC3729011?pdf=render
debug: Creating directory: PMC3605904/
debug: Downloading PDF: http://europepmc.org/articles/PMC3605904?pdf=render
debug: Creating directory: PMC3371435/
debug: Downloading PDF: http://europepmc.org/articles/PMC3364152?pdf=render
debug: Creating directory: PMC3364152/
debug: Downloading PDF: http://europepmc.org/articles/PMC3305877?pdf=render
debug: Creating directory: PMC3305877/
debug: Downloading PDF: http://europepmc.org/articles/PMC3066391?pdf=render
debug: Creating directory: PMC3066391/
debug: Downloading PDF: http://europepmc.org/articles/PMC2141413?pdf=render
debug: Creating directory: PMC2141413/
debug: Downloading PDF: http://www.biomedcentral.com/content/pdf/1478-5854-9-18.pdf
debug: Creating directory: PMC2871514/
debug: Downloading PDF: http://europepmc.org/articles/PMC2602195?pdf=render
debug: Creating directory: PMC2602195/
debug: Downloading PDF: http://www.biomedcentral.com/content/pdf/1471-2229-10-32.pdf
Downloading files [===---------------------------] 8% (eta 0.0s)
/home/ross/.nvm/v0.10.38/lib/node_modules/getpapers/lib/eupmc.js:333
          fourohfour();
          ^
TypeError: undefined is not a function
    at /home/ross/.nvm/v0.10.38/lib/node_modules/getpapers/lib/eupmc.js:333:11
    at /home/ross/.nvm/v0.10.38/lib/node_modules/getpapers/node_modules/got/index.js:152:6
    at BufferStream.<anonymous> (/home/ross/.nvm/v0.10.38/lib/node_modules/getpapers/node_modules/got/node_modules/read-all-stream/index.js:52:3)
    at BufferStream.emit (events.js:117:20)
    at finishMaybe (/home/ross/.nvm/v0.10.38/lib/node_modules/getpapers/node_modules/got/node_modules/read-all-stream/node_modules/readable-stream/lib/_stream_writable.js:499:14)
    at endWritable (/home/ross/.nvm/v0.10.38/lib/node_modules/getpapers/node_modules/got/node_modules/read-all-stream/node_modules/readable-stream/lib/_stream_writable.js:509:3)
    at BufferStream.Writable.end (/home/ross/.nvm/v0.10.38/lib/node_modules/getpapers/node_modules/got/node_modules/read-all-stream/node_modules/readable-stream/lib/_stream_writable.js:474:5)
    at Unzip.onend (_stream_readable.js:502:10)
    at Unzip.g (events.js:180:16)
    at Unzip.emit (events.js:117:20)

PS Gasteria is one of my favourite genera of succulent plants :)

rossmounce commented 9 years ago

Perhaps this issue may relate to https://github.com/ContentMine/getpapers/issues/25

Notice in the above Gasteria search it fails when it tries to download a BMC paper PDF from a non-EPMC link.

Other similar searches with PDF return, such Stegosaurus, below, work fine because they hit/return only EPMC links:

getpapers -q Stegosaurus --api eupmc -p   -l verbose  --outdir blah
petermr commented 9 years ago

Try to summarise this so that if it is a EuPMC bug we can relay it to EuPMC (Jo McEntyre)

rossmounce commented 9 years ago

looks like it could be EuPMC or BMC bug, yes. Rather than getpapers

blahah commented 9 years ago

This is more likely to be a getpapers bug - I will test it when I get a chance and relay upstream to EPMC if necessary.

blahah commented 9 years ago

The above query now works for me in the current code, so it looks like a recent change has inadvertently fixed it (not surprising as I have worked on urls a bit). When the next release is pushed, could you double-check this is fixed for you?