Closed JohannesBuchner closed 8 years ago
Can you post some of your source code so we can see what you're trying to do? If you're using the client (by Andy Casey), it's possible you're accessing other end points that have separate rate limits, and may be giving you this response.
Yes, I was running his beers-for-cites.py script
I had similar symptoms when running a script that uses Andy Casey's ads python module to issue repeated queries for DOI or arXiv identifiers and bibtex even though (below) I'm not close to the limit. The script I'm running is [https://github.com/dwillcox/pybib/blob/master/pybib.py]
def query_ads(self, query):
paper_query = ads.SearchQuery(**query)
paper_list = []
for p in paper_query:
paper_list.append(p)
self.paper = paper_list[0]
self.bibtex = self.paper.bibtex
self.query_ads({'identifier':'arXiv:1507.01927'})
Traceback (most recent call last):
File "/home/eugene/codes/astro/pybib/pybib.py", line 396, in <module>
dc = DocumentCollection(args.infiles)
File "/home/eugene/codes/astro/pybib/pybib.py", line 311, in __init__
self.documents = [Document(f) for f in files]
File "/home/eugene/codes/astro/pybib/pybib.py", line 197, in __init__
self.query_ads({'identifier':self.arxiv})
File "/home/eugene/codes/astro/pybib/pybib.py", line 267, in query_ads
self.bibtex = self.paper.bibtex
File "/home/eugene/local/anaconda2/lib/python2.7/site-packages/werkzeug/utils.py", line 73, in __get__
value = self.func(obj)
File "/home/eugene/local/anaconda2/lib/python2.7/site-packages/ads/search.py", line 231, in bibtex
return ExportQuery(bibcodes=self.bibcode, format="bibtex").execute()
File "/home/eugene/local/anaconda2/lib/python2.7/site-packages/ads/export.py", line 62, in execute
self.session.post(url, data=self.json_payload)
File "/home/eugene/local/anaconda2/lib/python2.7/site-packages/ads/base.py", line 39, in load_http_response
raise APIResponseError(HTTPResponse.text)
ads.exceptions.APIResponseError: u'Rate limit was exceeded'
< X-RateLimit-Limit: 5000
< X-RateLimit-Remaining: 4896
< X-RateLimit-Reset: 1458777600
Ah, okay, so I looked at the ads code a little more and since I get this error when trying to get bibtex, I see the ads module is accessing https://api.adsabs.harvard.edu/v1/export/bibtex
Doing:
curl -v -H "Authorization: Bearer [my token]" 'https://api.adsabs.harvard.edu/v1/export/bibtex'
Returns:
[...]
HTTP/1.1 429 TOO MANY REQUESTS
[...]
< X-RateLimit-Limit: 100
< X-RateLimit-Remaining: 0
< X-RateLimit-Reset: 1459036800
[...]
Thanks, @jonnybazookatone is correct in my case, I can make only 100 daily requests for bibtex. Once I generated a bibliography for my existing collection of PDFs, it's not so bad, since I'm unlikely to add 100 articles/day to my collection as of yet!
I don't know which endpoints are called by the beers-for-cites script, but the API urls you can try with the curl -v -H etc command are here: [https://github.com/andycasey/ads/blob/4871fc6c272f872a8324e24881b88ac4511ea544/ads/config.py]
Could the 100 be increased to 1000 or 400? I tried to programmatically build a bibtex collection of all papers that cite me, and ran into that problem.
The bibtex endpoint receives a list of identifiers, so you should send batch of identifiers (1 request; instead of N requests with 1 bibcode each). If the beers-for-cites script has that behaviour, I'd suggest making a PR to address that problem there.
Yes, @dwillcox, you're hitting your API limit on the /export
end point when you're using ExportQuery().execute(). You're better off, as @romanchyla points out, first collecting your bibcodes and then sending them to export query. For example,
q = ads.SearchQuery(q='star', fl=['id', 'bibcode'])
bibcodes = [article.bibcode for article in q]
bibtex_query = ads.ExportQuery(bibcodes=bibcodes, format='bibtex').execute()
This requires 1 request, and so will only reduce your rate limit on /export
by 1. If you do it for each bibcode separately, you'll obviously reach the limit at 100 bibcodes.
@JohannesBuchner, so you're best off trying to collect the bibcodes you need, and then send them to the export service as I've shown in the response to @dwillcox. If you were simply wanting to find all the papers that cite your papers, you can do this using the citations
operator. For example,
q = ads.SearchQuery(q='citations(first_author:"Buchner, J." database:astronomy)', fl=['id', 'bibcode'])
This will return all the bibcodes that cite papers with first author "Buchner, J." in our Astronomy database, using 1 request, and 1 more if you want to export them . However, this will run into author disambiguation, as it looks like there are a lot more J Buchners around. If you're only interested in yourself, and who cites you, you could also think about using ORCiD to identify papers that are yours. You would then simply do:
q = ads.SearchQuery(q='citations(orcid:0000-0001-8043-4965)', fl=['id', 'bibcode'])
and then send them to export. You can read about ORCiD here: http://adsabs.github.io/help/orcid/claiming-papers
Do note I've only requested 'id' and 'bibcode', if you start accessing other contents via the attributes, ie., Article.title
this will send another request, and use up your rate limit. To avoid this, you can request more field parameters on your initial request, fl=['id', 'bibcode', 'title']
and so on.
Finally, I would say this issue is mostly related to these two: https://github.com/andycasey/ads/issues/43, and https://github.com/andycasey/ads/issues/44. So keep an eye on here for changes that may help you avoid hitting rate limits due to unoptimised code.
Thanks so much for the quick replies and clear explanations, @romanchyla and @jonnybazookatone!
I didn't realize I could pass a list of bibcodes into ads.ExportQuery(), that's very helpful and I will change my script to do so.
My script calls pdfgrep to get DOI or arXiv ID's from PDFs and then builds a bibtex bibliography for me, and your API saves a lot of time. Thanks for the great work!
That sounds like a reasonable solution. To try it out, I ran in Bumblebee
citations(orcid:"0000-0003-0426-6634" AND first_author:"Buchner, J.")
Unfortunately, this returns many papers that are somehow related but actually do not cite me if you click into them ...
Can you give some examples? I looked at the first 10 documents returned and they all have citations to you in their References section.
Also, a quick glance at the numbers seems to make sense. When you search without citations()
you see you have two papers with 35 and 37, and with citations()
it returns 65 documents. The reduced number is papers that cite both of them, for example: 2015MNRAS.453.1946G
.
If you want to also know you total citation_count
(ie., 35+37=72), then you can send your ORCiD bibcode list to the metrics end point:
q = ads.SearchQuery(q='orcid:0000-0003-0426-6634 first_author:"Buchner, J."', fl=['id', 'bibcode'])
bibcodes = [article.bibcode for article in q]
bibtex = ads.ExportQuery(bibcodes=bibcodes, export='bibtex').execute()
metrics = ads.MetricsQuery(bibcodes=bibcodes).execute()
print metrics['citation stats']['total number of citations']
>>> 72
This is 3 requests.
Or, if you want to know which papers cite which document, then you can also request the citation list on your first query:
q = ads.SearchQuery(q='orcid:0000-0003-0426-6634 first_author:"Buchner, J."', fl=['id', 'bibcode', 'citation'])
articles = [article for article in q]
print articles[0].citation
>>> [u'2014A&A...571A..34V',
>>> ....
>>> u'2016PhRvL.116e1103L']
Then you can loop through them and count them. It really depends on how you're wanting to use it and change it in the future.
I'll close this ticket as the rate limits are behaving as expected. If you have any more questions on the topic, feel free to ask them here.
I get "Rate limit was exceeded" even though I am nowhere near the limit:
< X-RateLimit-Limit: 5000 < X-RateLimit-Remaining: 4864 < X-RateLimit-Reset: 1458000000