deedy5 / duckduckgo_search

Search for words, documents, images, videos, news, maps and text translation using the DuckDuckGo.com search engine. Downloading files and images to a local hard drive.
MIT License
927 stars 117 forks source link

What is the exact rate limit of DDG? #198

Closed heesuju closed 3 months ago

heesuju commented 3 months ago

Hello,

I'm using duckduckgo_search version 5.1.0. In my code, I'm using "AsyncDDGS.text(keyword)" in a for loop. I'm iterating through the loop with an interval of 10 seconds using "await asyncio.sleep(10)".

However, after 5-6 search requests, I'm getting a rate limit error. Any subsequent requests trigger a rate limit error as well for about 10-15 minutes. Until now I thought that the rate limit was 1-2 request per 10 seconds, but this doesn't seem to be the case.

Is there a total amount of requests that I'm allowed to send? Any help would be appreciated. Thank you in advance.

deedy5 commented 3 months ago

Hi, show me the code.

phamxtien commented 3 months ago

After upgrade to 5.1.0 it returns error

_aget_url() https://duckduckgo.com RequestsError: Impersonating BrowserType.chrome120 is not supported

deedy5 commented 3 months ago

@phamxtien Some kind of problem with curl-cffi. Try to reinstall duckduckgo_search: pip install -I duckduckgo_search

phamxtien commented 3 months ago

@phamxtien Some kind of problem with curl-cffi. Try to reinstall duckduckgo_search: pip install -I duckduckgo_search

I follow your guide, it still returns errror But use cli it runs smoothly image

My code

from duckduckgo_search import DDGS
from bs4 import BeautifulSoup

def ddgSearch(keywords, region='vn-vi', count=5):
    documents = []
    urls = []
    ddgs = DDGS()
    i = 1
    for keyword in keywords:
        icount = 1
        try: 
            for r in ddgs.text(keyword, region=region, safesearch='off', timelimit='y', max_results=count):
                print(r)
                try:
                    response = requests.get(r['href'])
                    soup = BeautifulSoup(response.text, 'html.parser')
                    body = soup.find('body').text
                    body = ' '.join(body.split())
                    documents.append(body)
                    urls.append(r['href'])
                    i = i + 1
                    icount = icount + 1
                    if icount > count: break
                except Exception as e:
                    print(str(e))
                    continue
        except Exception as e:
            print(str(e))
            continue
        time.sleep(6)
    return {'urls': urls, 'documents': documents}

and get error _aget_url() https://duckduckgo.com/ RequestsError: Impersonating BrowserType.chrome120 is not supported

Environment

OS: Ubuntu 23.10 Python: 3.11

deedy5 commented 3 months ago

are you importing requests?

phamxtien commented 3 months ago

are you importing requests?

Yes, i import requests already I miss it when create above comment image

deedy5 commented 3 months ago

I don't see the above error when I run your code. Reinstall duckduckgo_search in the virtual environment from which you are running the code.

phamxtien commented 3 months ago

I think this make it error image and I'm still stuck :(

heesuju commented 3 months ago

Hi, show me the code.

Hello again, sorry for the late reply. Here's my sample code.

I'm using the code from autogpt repository to get search results from duckduckgo_search This code used to work fine until a week ago. I think my IP might be blocked after making too many requests? (I used to use multi-threading to run like 20 requests at once) Now I get a rate limit error every 5-6 times I make a request.

import asyncio
import json
from itertools import islice
from duckduckgo_search import AsyncDDGS

async def web_search(query: str, num_results: int = 8) -> list[dict]:
    search_results = []
    attempts = 0

    while attempts < 3:
        if not query:
            return json.dumps(search_results)

        async with AsyncDDGS() as ddgs:
            results = await ddgs.text(query, safesearch='on', max_results=num_results, backend="html")
            search_results = list(islice(results, num_results))

        if search_results:
            break

        await asyncio.sleep(1)
        attempts += 1

    return search_results

async def main(url:str):
    keywords = ["keyword1", "keyword2", "keyword3", "keyword4", "keyword5"]
    for i in range(len(keywords)):
        results = await search_keyword(keywords[i], 10)
        await asyncio.sleep(10)
deedy5 commented 3 months ago
  1. Try to use backend='api', it's less likely to block.
  2. I used to use multi-threading to run like 20 requests at once -> use a proxy.
heesuju commented 3 months ago

Completely forgot to mention that I switched over to 'html' from 'api' after my ip started getting blocked. I guess my only option is using proxies. Thank you for the help!

iwo9 commented 2 months ago

Hi - did using proxies resolve the issue? I'm having the same problem - used to work fine, now I keep getting the rate limit exception after 5-6 search runs. Have to wait a while before it can run properly again. Was trying to see if there's a way to actually pay for duckduckgo-search so that I can guarantee it'll work for what I need, but can't find that either.

deedy5 commented 2 months ago

@iwo9 Just use a rotating proxy https://github.com/deedy5/duckduckgo_search?tab=readme-ov-file#proxy