ContentMine / getpapers

Get metadata, fulltexts or fulltext URLs of papers matching a search query
MIT License
197 stars 37 forks source link

Added option to limit number of hits #75

Closed tarrow closed 8 years ago

tarrow commented 8 years ago

This is a patch to limit the number of hits with -k or --limit. It also fixes a possible bug #74

blahah commented 8 years ago

cc #65

blahah commented 8 years ago

Great! Please make sure it applies to all the APIs.

tarrow commented 8 years ago

Whoops, I sort of over looked this. Should I be making an arxiv patch on master or on the arxiv update branch?

blahah commented 8 years ago

I've now merged the arxiv branch into master, so you can fetch the changes and work directly in this PR

tarrow commented 8 years ago

Great; I'll get right on that!

On Wed, Feb 24, 2016 at 11:37 AM, Richard Smith-Unna < notifications@github.com> wrote:

I've now merged the arxiv branch into master, so you can fetch the changes and work directly in this PR

— Reply to this email directly or view it on GitHub https://github.com/ContentMine/getpapers/pull/75#issuecomment-188213228.

tarrow commented 8 years ago

you'll notice in my PR I changed the pageSize param we send to EuropePMC to 100. the default is 25 but it can be as many as 1000. Do you have any thoughts as to what we should set it to? A larger number means fewer requests but it takes EuropePMC longer to respond. At 1000 we can hit the timeout. It also means the progress bar has much less granularity.

This is in this pull request because obviously we need to know the page size to page it.

tarrow commented 8 years ago

If you could have a look at this for me @Blahah that would be great. I think it is ok; I'm always keen to get back any criticism you might have about style or logic though.

It should address all apis and enable -k to limit the number of papers.

It also closes #74.

petermr commented 8 years ago

This would be super valuable. In demos it's useful to limit demo to (say) 200 hits.

tarrow commented 8 years ago

Subject to code-review by @Blahah this code should now all be ready to go.

Edit: as per my above. I thought this was going to the bug report not the PR. This wasn't meant to be a repeat.

On Thu, Feb 25, 2016 at 11:46 AM, petermr notifications@github.com wrote:

This would be super valuable. In demos it's useful to limit demo to (say) 200 hits.

— Reply to this email directly or view it on GitHub https://github.com/ContentMine/getpapers/pull/75#issuecomment-188748927.

blahah commented 8 years ago

@tarrow sorry it took me a while to get to this.

RE the page size for eupmc - 100 is fine for now. I set the ArXiv one by experimenting with a range of values and choosing the largest one that never timed out in a large number of test runs. But it's probably internet connection speed dependent - we may have to revisit later.

Code all looks good - great job!