Closed GianfrancoCorrea closed 7 months ago
Try using a proxy. The package works, all tests pass. Maybe your ip is blocked by the site.
Same issue here except I'm not using the async variant - and it seems very intermittent.
Admittedly not a large sample size but it mainly occurs when I'm handling multiple inbound http requests - some resource sharing issue maybe but what do I know :)
@deedy5 if it is actually to do with being blocked, could we have a nice way to handle this? I might be a bit of a noob but not exactly sure how to catch these exceptions...
Awesome work btw :)
If you send multiple requests in parallel, the site will block your ip for a while.
The solution is simple - either send requests sequentially in one stream, or use a proxy so that your ip is different for each request. https://github.com/deedy5/duckduckgo_search#using-proxy
Okay, is there a way to catch and handle these exceptions, at the moment it's happening behind the scenes
Try using a proxy. The package works, all tests pass. Maybe your ip is blocked by the site.
today I started to see same errors.
Sequential queries with DDGS()
(not async). After 2-3 requests with interval of <10s API starts to respond with 202
and this causes HTTPError
.
There's no mention in logs why _is_500_in_url(str(resp.url)) or resp.status_code == 202
added in a first place and can't find what does 202
mean at DDG. Is there a more graceful way to handle it, not just raising error after 2 quick retries?
Have the same issue which happens ocasionaly without frequent request (1 request per 10-20min) Edt: I was wrong, it sends 4-5 rquests in a row once in 10-20min Also tested over CLI, ddg respond with 202 on 3-4 request in a row
I've build a workaround for the limit Looks like the limit is ~2request per 10sec
import asyncio
from duckduckgo_search import AsyncDDGS
class AsyncRateLimitedActionWrapper:
def __init__(self, rate_limit: int, time_period: float):
"""
:param rate_limit: The maximum number of requests allowed per time period.
:param time_period: Time period in seconds over which the rate limit applies.
"""
self.rate_limit = rate_limit
self.time_period = time_period
self.slots = asyncio.Queue(maxsize=rate_limit)
self.history = asyncio.Queue()
self.generator = None
async def _generate_slots(self):
"""
Generate slots for the rate limit.
"""
while True:
if self.slots.empty() or self.history.empty():
# no calls in the time period window
await asyncio.sleep(self.time_period)
else:
# first call in the time period window
first_time_call = await self.history.get()
self.history.task_done()
current_time = asyncio.get_event_loop().time()
await asyncio.sleep(self.time_period - current_time + first_time_call)
await self.slots.get() # put back the slot once call is out of the framed time period
self.slots.task_done()
async def _consume_slot(self):
"""
Consume a slot.
"""
if self.generator is None:
self.generator = self._generate_slots()
asyncio.create_task(self.generator)
await self.slots.put(1)
await self.history.put(asyncio.get_event_loop().time())
async def perform(self, action: callable, *args, **kwargs):
"""
Asynchronously make an action, respecting the rate limit.
"""
await self._consume_slot()
return await action(*args, **kwargs)
limit_wrapper = AsyncRateLimitedActionWrapper(2, 10)
async def _search(search_query):
async with AsyncDDGS() as ddgs:
results = [r async for r in ddgs.text(search_query)]
return results
async def search(search_queries) -> list[dict]:
return await limit_wrapper.perform(_search, search_queries)
Thank you all for finding the error. The site sometimes makes changes. Fixed in version v3.9.6.
thanks, it mostly works. Started to fail after ~10 min (with a rate of 3 useful reqs per 10s):
11:11:00: HTTP Request: POST https://duckduckgo.com "HTTP/2 200 OK"
11:11:00: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=0&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=50&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=100&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=150&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=200&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=250&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=300&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=350&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=400&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=450&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
11:11:01: HTTP Request: GET https://links.duckduckgo.com/d.js?q=Officny&kl=wt-wt&l=wt-wt&bing_market=wt-WT&s=500&df=y&vqd=4-46289096192774020376654725798572440979&o=json&sp=0&ex=-1 "HTTP/2 202 Accepted"
what does that s
mean? Perhaps we need to apply smarter backoff instead of just 10 immediate repeats?
Try using a proxy. 202 reponse code is their way of blocking ip at the moment.
s
is pagination.
I tested v3.9.6, text() function with 100 random keywords. There are no 202 responses at all. Working in a single thread does not cause errors. But if you work in multithreaded mode, errors will occur. In this case you should use a proxy.
Yesterday someone reported this bug, but he deleted the issue, so i don't know if it has some solution or what...
code:
Debug log
Specify this information