deedy5 / duckduckgo_search

Search for words, documents, images, videos, news, maps and text translation using the DuckDuckGo.com search engine. Downloading files and images to a local hard drive.
MIT License
927 stars 117 forks source link

Getting Rate-Limitited while having 1-minute breaks in-between #213

Closed CezaPasc closed 1 month ago

CezaPasc commented 1 month ago

Describe the bug

I am getting rate-limited receiving a 202 Status Code even though I am waiting at least 1 minute between the requests. It usually happens after 6-7 requests I'm not sure if it matters, but I'm still able to visit https://duckduckgo.com/in the browser after being rate-limited.

Steps to reproduce the problem: The problem can be reproduced executing this:

from duckduckgo_search import DDGS
import time

class DuckDuckGo:

    def __init__(self, limit=5) -> None:
        self.limit = limit
        self.query = ""
        self.duck = DDGS()
        self.new_search = False
        self.last_request = 0
        self.cooldown = 60

    def perform_search(self):
        assert self.query != "", "Please set a search query first using `set_search(query)`"
        now = time.time()
        difference = now - self.last_request
        if difference < self.cooldown:
            to_wait = self.cooldown - difference
            print("Waiting %s before next request" % to_wait)
            time.sleep(to_wait)

        # Send the request
        results = self.duck.text(self.query, max_results=self.limit)
        self.new_search = False
        self.last_request = time.time()

        for result in results:
            result["link"] = result.pop("href") 

        return results

    def set_search(self, query):
        self.query = f'site:finance.yahoo.com/news/ "{query}"'
        self.new_search = True

if __name__ == "__main__":
    duck = DuckDuckGo()

    while True:
        duck.set_search("some")
        results = duck.perform_search()

        duck.set_search("search")
        results = duck.perform_search()

        duck.set_search("terms")
        results = duck.perform_search()

Exception: duckduckgo_search.exceptions.RatelimitException: https://duckduckgo.com/ 202 Ratelimit

Specify this information

deedy5 commented 1 month ago

Try the version with httpx for now: pip install -U duckduckgo_search==5.3.0b4

av1d commented 1 month ago

Was having same issue, the httpx version fixed it. Thank you.

CezaPasc commented 1 month ago

I am now using this version and I am still facing the same issue. duckduckgo_search 5.3.0b4

Do I also have to change something in the code, like specifying a different backend?

DavidEstape commented 1 month ago

Trying the suggested version (5.3.0b4) also solved the problem for me. pip install -U duckduckgo_search==5.3.0b4

In case you are using Windows, you also have to change the default EventLoopPolicy of asyncio library BEFORE importing the other libraries.

import asyncio
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

I don't know if I will be rate-limited again with the httpx version after a while...

deedy5 commented 1 month ago

@DavidEstape

In case you are using Windows, you also have to change the default EventLoopPolicy of asyncio library BEFORE importing the other libraries.

This is applied automatically: https://github.com/deedy5/duckduckgo_search/blob/fce7fab463934f9e4d844ca5ca40a185a656a468/duckduckgo_search/__init__.py#L18C1-L19C76

deedy5 commented 1 month ago

@CezaPasc

I am now using this version and I am still facing the same issue. duckduckgo_search 5.3.0b4 Do I also have to change something in the code, like specifying a different backend?

Changing the backend parameter will not provide any benefits.

DavidEstape commented 1 month ago

@DavidEstape

In case you are using Windows, you also have to change the default EventLoopPolicy of asyncio library BEFORE importing the other libraries.

This is applied automatically: https://github.com/deedy5/duckduckgo_search/blob/fce7fab463934f9e4d844ca5ca40a185a656a468/duckduckgo_search/__init__.py#L18C1-L19C76

Yes. You are correct. I tried without and no warning now, so no need to change the EventLoopPolicy. Thank you for your effort.

deedy5 commented 1 month ago

@CezaPasc

There's a new version (v5.3.1) out, give it a try. pip install -U duckduckgo_search

CezaPasc commented 1 month ago

Could it be that the problem is somehow on my fault? I tested it now again twice: in the same environment and then on an other server (using Linux instead of Mac OS) with a fresh Python environment only having duckduckgo_search installed. duckduckgo_search 5.3.1

jzxcd commented 1 month ago

Still running into the same after update to v5.3.1. Was running 1 search per 3-5 sec for total ~50 searches. The error message is bit confusing since I can see the search output in the provide url. Do we just wait it out for soft block period or there is a fix?

RatelimitException: https://links.duckduckgo.com/d.js?q=sklearn+repo+site%3Agithub.com&kl=wt-wt&l=wt-wt&p=&s=0&df=&vqd=4-292300349075791261695672170567302953005&ex=-1 202 Ratelimit
moraneden commented 1 month ago

duckduckgo_search==5.3.0b4 works for me while 5.3.1 still fails

AlexUmnov commented 1 month ago

I'll leave my two cents here. I was using this lib with langchain inside Collaboratory. Restarting the runtime fixed it, while every other solution didn't. Edit: Doesn't seem to always solve the issue

vk-maurya commented 1 month ago

I am using windows system its working after adding code below code before

import asyncio

asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

from duckduckgo_search import DDGS

results = DDGS().text("who twitter ceo", max_results=5)
print(results)

Thanks @DavidEstape

deedy5 commented 1 month ago

What is the current situation? Does anyone have v5.3.1 or 5.3.1b1 working?

aliciannz commented 1 month ago

Neither 5.3.1 nor 5.3.1b1 is working for me. In my case, it freezes forever instead of raising an exception. However, 5.3.0b4 seems to be working.

deedy5 commented 1 month ago

Update to v6.0.0