adsabs / adsabs-dev-api

Developer API service description and example client code
165 stars 58 forks source link

Rate limit was exceeded #26

Closed JohannesBuchner closed 8 years ago

JohannesBuchner commented 8 years ago

I get "Rate limit was exceeded" even though I am nowhere near the limit:

< X-RateLimit-Limit: 5000 < X-RateLimit-Remaining: 4864 < X-RateLimit-Reset: 1458000000

jonnybazookatone commented 8 years ago

Can you post some of your source code so we can see what you're trying to do? If you're using the client (by Andy Casey), it's possible you're accessing other end points that have separate rate limits, and may be giving you this response.

JohannesBuchner commented 8 years ago

Yes, I was running his beers-for-cites.py script

dwillcox commented 8 years ago

I had similar symptoms when running a script that uses Andy Casey's ads python module to issue repeated queries for DOI or arXiv identifiers and bibtex even though (below) I'm not close to the limit. The script I'm running is [https://github.com/dwillcox/pybib/blob/master/pybib.py]

Simplified Code

def query_ads(self, query):
    paper_query = ads.SearchQuery(**query)
    paper_list = []
    for p in paper_query:
        paper_list.append(p)
    self.paper = paper_list[0]
    self.bibtex = self.paper.bibtex

self.query_ads({'identifier':'arXiv:1507.01927'})

Traceback

Traceback (most recent call last):
  File "/home/eugene/codes/astro/pybib/pybib.py", line 396, in <module>
    dc = DocumentCollection(args.infiles)
  File "/home/eugene/codes/astro/pybib/pybib.py", line 311, in __init__
    self.documents = [Document(f) for f in files]
  File "/home/eugene/codes/astro/pybib/pybib.py", line 197, in __init__
    self.query_ads({'identifier':self.arxiv})
  File "/home/eugene/codes/astro/pybib/pybib.py", line 267, in query_ads
    self.bibtex = self.paper.bibtex
  File "/home/eugene/local/anaconda2/lib/python2.7/site-packages/werkzeug/utils.py", line 73, in __get__
    value = self.func(obj)
  File "/home/eugene/local/anaconda2/lib/python2.7/site-packages/ads/search.py", line 231, in bibtex
    return ExportQuery(bibcodes=self.bibcode, format="bibtex").execute()
  File "/home/eugene/local/anaconda2/lib/python2.7/site-packages/ads/export.py", line 62, in execute
    self.session.post(url, data=self.json_payload)
  File "/home/eugene/local/anaconda2/lib/python2.7/site-packages/ads/base.py", line 39, in load_http_response
    raise APIResponseError(HTTPResponse.text)
ads.exceptions.APIResponseError: u'Rate limit was exceeded'

Rate Limits after Traceback

< X-RateLimit-Limit: 5000
< X-RateLimit-Remaining: 4896
< X-RateLimit-Reset: 1458777600
dwillcox commented 8 years ago

Ah, okay, so I looked at the ads code a little more and since I get this error when trying to get bibtex, I see the ads module is accessing https://api.adsabs.harvard.edu/v1/export/bibtex

Doing:

curl -v -H "Authorization: Bearer [my token]" 'https://api.adsabs.harvard.edu/v1/export/bibtex'

Returns:

[...]
HTTP/1.1 429 TOO MANY REQUESTS
[...]
< X-RateLimit-Limit: 100
< X-RateLimit-Remaining: 0
< X-RateLimit-Reset: 1459036800
[...]

Thanks, @jonnybazookatone is correct in my case, I can make only 100 daily requests for bibtex. Once I generated a bibliography for my existing collection of PDFs, it's not so bad, since I'm unlikely to add 100 articles/day to my collection as of yet!

I don't know which endpoints are called by the beers-for-cites script, but the API urls you can try with the curl -v -H etc command are here: [https://github.com/andycasey/ads/blob/4871fc6c272f872a8324e24881b88ac4511ea544/ads/config.py]

JohannesBuchner commented 8 years ago

Could the 100 be increased to 1000 or 400? I tried to programmatically build a bibtex collection of all papers that cite me, and ran into that problem.

romanchyla commented 8 years ago

The bibtex endpoint receives a list of identifiers, so you should send batch of identifiers (1 request; instead of N requests with 1 bibcode each). If the beers-for-cites script has that behaviour, I'd suggest making a PR to address that problem there.

jonnybazookatone commented 8 years ago

Yes, @dwillcox, you're hitting your API limit on the /export end point when you're using ExportQuery().execute(). You're better off, as @romanchyla points out, first collecting your bibcodes and then sending them to export query. For example,

q = ads.SearchQuery(q='star', fl=['id', 'bibcode'])
bibcodes = [article.bibcode for article in q]

bibtex_query = ads.ExportQuery(bibcodes=bibcodes, format='bibtex').execute()

This requires 1 request, and so will only reduce your rate limit on /export by 1. If you do it for each bibcode separately, you'll obviously reach the limit at 100 bibcodes.

jonnybazookatone commented 8 years ago

@JohannesBuchner, so you're best off trying to collect the bibcodes you need, and then send them to the export service as I've shown in the response to @dwillcox. If you were simply wanting to find all the papers that cite your papers, you can do this using the citations operator. For example,

q = ads.SearchQuery(q='citations(first_author:"Buchner, J." database:astronomy)', fl=['id', 'bibcode'])

This will return all the bibcodes that cite papers with first author "Buchner, J." in our Astronomy database, using 1 request, and 1 more if you want to export them . However, this will run into author disambiguation, as it looks like there are a lot more J Buchners around. If you're only interested in yourself, and who cites you, you could also think about using ORCiD to identify papers that are yours. You would then simply do:

q = ads.SearchQuery(q='citations(orcid:0000-0001-8043-4965)', fl=['id', 'bibcode'])

and then send them to export. You can read about ORCiD here: http://adsabs.github.io/help/orcid/claiming-papers

Do note I've only requested 'id' and 'bibcode', if you start accessing other contents via the attributes, ie., Article.title this will send another request, and use up your rate limit. To avoid this, you can request more field parameters on your initial request, fl=['id', 'bibcode', 'title'] and so on.

Finally, I would say this issue is mostly related to these two: https://github.com/andycasey/ads/issues/43, and https://github.com/andycasey/ads/issues/44. So keep an eye on here for changes that may help you avoid hitting rate limits due to unoptimised code.

dwillcox commented 8 years ago

Thanks so much for the quick replies and clear explanations, @romanchyla and @jonnybazookatone!

I didn't realize I could pass a list of bibcodes into ads.ExportQuery(), that's very helpful and I will change my script to do so.

My script calls pdfgrep to get DOI or arXiv ID's from PDFs and then builds a bibtex bibliography for me, and your API saves a lot of time. Thanks for the great work!

JohannesBuchner commented 8 years ago

That sounds like a reasonable solution. To try it out, I ran in Bumblebee

citations(orcid:"0000-0003-0426-6634" AND first_author:"Buchner, J.")

Unfortunately, this returns many papers that are somehow related but actually do not cite me if you click into them ...

jonnybazookatone commented 8 years ago

Can you give some examples? I looked at the first 10 documents returned and they all have citations to you in their References section.

Also, a quick glance at the numbers seems to make sense. When you search without citations() you see you have two papers with 35 and 37, and with citations() it returns 65 documents. The reduced number is papers that cite both of them, for example: 2015MNRAS.453.1946G.

If you want to also know you total citation_count (ie., 35+37=72), then you can send your ORCiD bibcode list to the metrics end point:

q = ads.SearchQuery(q='orcid:0000-0003-0426-6634 first_author:"Buchner, J."', fl=['id', 'bibcode'])
bibcodes = [article.bibcode for article in q]

bibtex = ads.ExportQuery(bibcodes=bibcodes, export='bibtex').execute()

metrics = ads.MetricsQuery(bibcodes=bibcodes).execute()
print metrics['citation stats']['total number of citations']
>>> 72

This is 3 requests.

jonnybazookatone commented 8 years ago

Or, if you want to know which papers cite which document, then you can also request the citation list on your first query:

q = ads.SearchQuery(q='orcid:0000-0003-0426-6634 first_author:"Buchner, J."', fl=['id', 'bibcode', 'citation'])
articles = [article for article in q]
print articles[0].citation
>>> [u'2014A&A...571A..34V',
>>> ....
>>> u'2016PhRvL.116e1103L']

Then you can loop through them and count them. It really depends on how you're wanting to use it and change it in the future.

jonnybazookatone commented 8 years ago

I'll close this ticket as the rate limits are behaving as expected. If you have any more questions on the topic, feel free to ask them here.