Open marccarre opened 2 years ago
By the way, doesn't WikiData provide rate limit header fields? If it has them we could intelligently control the request rate from client side.
I looked for this too but, unless I missed something, couldn't find such header in the response:
{'accept-ch': 'Sec-CH-UA-Arch,Sec-CH-UA-Bitness,Sec-CH-UA-Full-Version-List,Sec-CH-UA-Model,Sec-CH-UA-Platform-Version',
'accept-ranges': 'bytes',
'access-control-allow-origin': '*',
'age': '1',
'cache-control': 'public, max-age=300',
'content-encoding': 'gzip',
'content-type': 'application/sparql-results+json;charset=utf-8',
'date': 'Fri, 22 Jul 2022 09:33:06 GMT',
'nel': '{ "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, '
'"success_fraction": 0.0}',
'permissions-policy': 'interest-cohort=(),ch-ua-arch=(self '
'"intake-analytics.wikimedia.org"),ch-ua-bitness=(self '
'"intake-analytics.wikimedia.org"),ch-ua-full-version-list=(self '
'"intake-analytics.wikimedia.org"),ch-ua-model=(self '
'"intake-analytics.wikimedia.org"),ch-ua-platform-version=(self '
'"intake-analytics.wikimedia.org")',
'report-to': '{ "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": '
'"https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" '
'}] }',
'server': 'nginx/1.14.2',
'server-timing': 'cache;desc="pass", host;desc="cp5008"',
'set-cookie': 'WMF-Last-Access=22-Jul-2022;Path=/;HttpOnly;secure;Expires=Tue, '
'23 Aug 2022 00:00:00 GMT, '
'WMF-Last-Access-Global=22-Jul-2022;Path=/;Domain=.wikidata.org;HttpOnly;secure;Expires=Tue, '
'23 Aug 2022 00:00:00 GMT',
'strict-transport-security': 'max-age=106384710; includeSubDomains; preload',
'transfer-encoding': 'chunked',
'vary': 'Accept, Accept-Encoding',
'x-cache': 'cp5009 miss, cp5008 pass',
'x-cache-status': 'pass',
'x-client-ip': '***.***.***.***',
'x-first-solution-millis': '48',
'x-served-by': 'wdqs2001'}
There seems to be only this recommendation on the request rate: https://wikitech.wikimedia.org/wiki/Robot_policy#Request_rate
Thank you for letting me know that! 🙏🏼
Version
0.7.0
Problem
When sending several (10~100) requests in a row, some requests fail, without determinism, with the following error:
Upon closer investigation, the actual response is a
429
,Too many requests. Please comply with the User-Agent policy to get a higher rate limit: https://meta.wikimedia.org/wiki/User-Agent_policy
, andRoot cause
This library doesn't follow Wikimedia's user-agent policy, specifically:
which leads in a temporary rate limiting/blacklisting of the agent:
See also: https://meta.wikimedia.org/wiki/User-Agent_policy
Solution
Set an
User-Agent
header compliant with the above policy, e.g.: