ContentMine / getpapers

Get metadata, fulltexts or fulltext URLs of papers matching a search query
MIT License
197 stars 37 forks source link

Malformed EUPMC query doesn't terminate, nor give error message #143

Open rossmounce opened 7 years ago

rossmounce commented 7 years ago

Perhaps related to #114 I noted recently that a subtly malformed EUPMC query gets stuck at "Searching using eupmc API" without terminating (ever), or giving an error message. This could lead some users to wait 10 minutes assuming it's loading a big query.

The user error causing this is missing the colon. It should be (FIRST_PDATE:[2013-01-01 TO 2016-12-05]) but instead (FIRST_PDATE[2013-01-01 TO 2016-12-05]) is given.

It would be nice (although perhaps hard), if this could be detected & some kind of informative error message supplied.

getpapers -V
0.4.10

getpapers -q '"arbuscular mycorrhizae" (FIRST_PDATE[2013-01-01 TO 2016-12-05])' -o malformedquery
info: Searching using eupmc API
blahah commented 7 years ago

I think we have to rely on the eupmc to tell us that a query is malformed. It's possible that they already do this and we aren't handling the response properly.

tarrow commented 7 years ago

How long did it run for?

Basically the problem we have/had is that (with a changing likelihood of it happening) EuPMC sometimes responds with an error page or simply no results when there are actually results to be delivered.

For this reason we now very aggressively retry until we get a successful response but not forever. I would expect it to eventually fail; probably in under 10mins but perhaps longer.

I'd like to retry less aggressively but I was going to wait and see if we have a few more months of relative stability before I turn it down.

rossmounce commented 7 years ago

My impression was that it would run indefinitely. But I am running it again now with time to see if it does in fact terminate at some point...

rossmounce commented 7 years ago

Apologies, it seems I was being impatient:

time getpapers -q '"arbuscular mycorrhizae" (FIRST_PDATE[2013-01-01 TO 2016-12-05])' -o malformedquery
info: Searching using eupmc API
warn: We had to retry the last request 50 times.
error: Malformed or empty response from EuropePMC. Try running again. Perhaps your query is wrong.

real    4m11.392s
user    0m1.376s
sys 0m0.104s
tarrow commented 7 years ago

I'll now knock this down to 5 times.

This is now causing a problem itself because EuPMC are now reporting incorrect number of results even in one page. e.g. say there are 10 results. Only return 9 and then we retry forever trying to get that 'last' one that isn't there.

I reported this bug to them before but obviously it has reoccurred.