ContentMine / getpapers

Get metadata, fulltexts or fulltext URLs of papers matching a search query
MIT License
197 stars 37 forks source link

A bad query to EPMC is not handled well. #89

Open tarrow opened 8 years ago

tarrow commented 8 years ago

See the nasty stack trace. Probably the response we get from EuropePMC is empty or contains a 'malformed query' response. Notice the query is -aardvark This will definitely not return results but may also be invalid. Will investigate.

tom@pisces contentmine % ./getpapers/bin/getpapers.js -q -aardvark -o aardvark -x 
info: Searching using eupmc API
/home/tom/src/contentmine/getpapers/lib/eupmc.js:64
  if(!resp.hitCount[0] || !resp.resultList[0].result) { 
                   ^
TypeError: Cannot read property '0' of undefined
    at EuPmc.completeCallback (/home/tom/src/contentmine/getpapers/lib/eupmc.js:64:20)
    at Request.emit (events.js:110:17)
    at Request.mixin._fireSuccess (/home/tom/src/contentmine/getpapers/node_modules/restler/lib/restler.js:229:10)
    at /home/tom/src/contentmine/getpapers/node_modules/restler/lib/restler.js:161:20
    at /home/tom/src/contentmine/getpapers/node_modules/restler/lib/restler.js:456:9
    at Parser.<anonymous> (/home/tom/src/contentmine/getpapers/node_modules/restler/node_modules/xml2js/lib/xml2js.js:344:20)
    at Parser.emit (events.js:107:17)
    at Object.saxParser.onclosetag (/home/tom/src/contentmine/getpapers/node_modules/restler/node_modules/xml2js/lib/xml2js.js:314:24)
    at emit (/home/tom/src/contentmine/getpapers/node_modules/restler/node_modules/xml2js/node_modules/sax/lib/sax.js:615:33)
    at emitNode (/home/tom/src/contentmine/getpapers/node_modules/restler/node_modules/xml2js/node_modules/sax/lib/sax.js:620:3)
petermr commented 8 years ago

When there are no results there is usually a "Malformed ..." response which is very confusing:

localhost:projects pm286$ getpapers -q foobaz -o foobaz -n
info: Searching using eupmc API
info: Running in no-execute mode, so nothing will be downloaded
error: Malformed response from EuropePMC. Try running again.

if the message:

Malformed response from EuropePMC. Try running again

is under our control it should read something like:

EPMC has no results. This could be the simple truth (check spelling etc.) or a malformed query 

Are there any cases where running again is useful? If not, delete the message.

tarrow commented 8 years ago

This will be partially fixed by my patch #89 if we merge it. My new error line reads "Malformed or empty response from EuropePMC. Try running again. Perhaps your query is wrong."

Perhaps this should read more like yours. Say: "Malformed or empty response from EuropePMC. Try running again or perhaps there really are no results"

tarrow commented 8 years ago

Also, to clarify we do (not often) get a malformed response from EuropePMC if something goes wrong on their end (e.g. their load balancer redirects to a non-functional server); if so we then should retry as this usually fixes it. It is also common in places where people have their network altered by WiFi networks that want to push some weblogin page etc..

xguse commented 8 years ago

This is still happening to me. No search terms work for any of the apis.

Here is my version info and an example with the debug flag on.

(contentmine)
gus at r45012 in ~/tmp/conda/getpapers
$ npm view getpapers version
0.4.5

(contentmine)
gus at r45012 in ~/tmp/conda/getpapers
$ getpapers -q "zika" -o test -x -l debug
info: Searching using eupmc API
debug: http://www.ebi.ac.uk/europepmc/webservices/rest/search/query=zika%20OPEN_ACCESS%3Ay&resulttype=core&pageSize=100&page=1
/home/gus/.anaconda/envs/contentmine/lib/node_modules/getpapers/lib/eupmc.js:66
  if(!resp.hitCount || !resp.hitCount[0] || !resp.resultList[0].result) {
          ^

TypeError: Cannot read property 'hitCount' of undefined
    at EuPmc.completeCallback (/home/gus/.anaconda/envs/contentmine/lib/node_modules/getpapers/lib/eupmc.js:66:11)
    at emitTwo (events.js:87:13)
    at Request.emit (events.js:172:7)
    at Request.mixin._fireSuccess (/home/gus/.anaconda/envs/contentmine/lib/node_modules/getpapers/node_modules/restler/lib/restler.js:229:10)
    at /home/gus/.anaconda/envs/contentmine/lib/node_modules/getpapers/node_modules/restler/lib/restler.js:161:20
    at IncomingMessage.parsers.auto (/home/gus/.anaconda/envs/contentmine/lib/node_modules/getpapers/node_modules/restler/lib/restler.js:402:7)
    at Request.mixin._encode (/home/gus/.anaconda/envs/contentmine/lib/node_modules/getpapers/node_modules/restler/lib/restler.js:198:29)
    at /home/gus/.anaconda/envs/contentmine/lib/node_modules/getpapers/node_modules/restler/lib/restler.js:157:16
    at Request.mixin._decode (/home/gus/.anaconda/envs/contentmine/lib/node_modules/getpapers/node_modules/restler/lib/restler.js:173:7)
    at IncomingMessage.<anonymous> (/home/gus/.anaconda/envs/contentmine/lib/node_modules/getpapers/node_modules/restler/lib/restler.js:150:14)

It might be the proxy of the hospital I work in, but thats kind of a big issue. I expect many folks that would be your target audience for this stuff will work in hospitals or other places that demand the use of proxies etc.

I am super bummed about this. I am a postdoc who just switched fields and having something like this to fast track my getting up-to-speed with my group would have been more than amazing!

Thanks for your work. I will keep an eye on you guys!

PS: the link provided in the debug log line (http://www.ebi.ac.uk/europepmc/webservices/rest/search/query=zika%20OPEN_ACCESS%3Ay&resulttype=core&pageSize=100&page=1) seems to contain tons of results when I paste it into my browser...

tarrow commented 8 years ago

Thanks for flagging this up again. It seems that I overlooked yet another situation where we get an odd response from EuropePMC.

Out of interest did you try running it again after successfully seeing all the results in the web browser?

petermr commented 8 years ago

@xguse we'll try to sort out your problems and happy to find out what you want to search for.