deedy5 / duckduckgo_search

Search for words, documents, images, videos, news, maps and text translation using the DuckDuckGo.com search engine. Downloading files and images to a local hard drive.
MIT License
1.12k stars 131 forks source link

I would like to report a bug related to the 's' parameter. #252

Open ddwego opened 1 week ago

ddwego commented 1 week ago

Before you open an issue:

Describe the bug

Firstly, I understand that the 's' parameter (used for pagination with 'start') cannot be passed directly, and only the 'max_results' parameter can be passed. In the method, slist.extend(range(23, max_results, 50)) is used to control the 's' parameter. I haven't encountered any issues before, although I'm not sure why it was set to 23.

However, recently, I noticed an incorrect search result. Here is my query: DDGS().text("site:www.midge.co.jp", safesearch="off", backend="html", max_results=50) But it only returns 4 results, whereas DuckDuckGo's website returns more than 10 results.

I tried modifying the method in the class, changing the 's' parameter to 5, and was able to retrieve more results afterward.

deedy5 commented 1 week ago

Use backend=html or lite only if the api(default) doesn't work

ddwego commented 1 week ago

I have tried the backend with the api,html,lite, and all of them only return 4 results, while there are actually more than 10 results. Only by changing the 's' parameter to 5 can I retrieve the subsequent results. Are you considering making the 's' parameter adjustable in the future? I don't want to rewrite the method and risk making it unupdatable later on.

deedy5 commented 1 week ago

The number of api requests depends on the max_results parameter.

max_results number of requests
<=23 1
<=73 2
<=123 3
<=173 4
<=223 5

and so on ... So try to set max_results=24 or 74 or 124 ...

ddwego commented 1 week ago

I understand what you mean. Normally, DuckDuckGo returns 23 results on the first page, and to access the second page, you need to set max_results to more than 23, like 24. Then you would take all from the first page + the first result from the second page. This method works well most of the time. There are special cases, like when using DDGS().text("site:www.midge.co.jp", safesearch="off", backend="html", max_results=50), where the first page only returns 5 results, even though there are actually more than 10 available. If I input 24, the s parameter for the second page is 23. It won't retrieve results 6-10.