ContentMine / getpapers

Get metadata, fulltexts or fulltext URLs of papers matching a search query
MIT License
197 stars 37 forks source link

Paging problem with arxiv #78

Closed tarrow closed 8 years ago

tarrow commented 8 years ago

Arxiv has a problem with paging which means that it downloads many duplicates of data. The max_results parameter defines how many results per page and the start parameter defines which result to start on not which page.

We also don't test for duplicates and filter them out like we do with EuropePMC which we could consider doing.

tarrow commented 8 years ago

This was already fixed by the merge today. My mistake :)