deedy5 / duckduckgo_search

Search for words, documents, images, videos, news, maps and text translation using the DuckDuckGo.com search engine. Downloading files and images to a local hard drive.
MIT License
1.12k stars 131 forks source link

Got empty results when use example #35

Closed heya5 closed 1 year ago

heya5 commented 1 year ago

I ran the code, but sometimes I just got None.

from duckduckgo_search import ddg

keywords = 'Bella Ciao'
results = ddg(keywords, region='wt-wt', safesearch='Off', time='y')
print(results)
deedy5 commented 1 year ago

It happens sometimes. duckduckgo.com can block your ip for a few seconds if you send requests too often. In this case, you will get None. Just repeat the request.

heya5 commented 1 year ago

Got it. Thanks!

eulenleber commented 1 year ago

How would you distinguish an actual empty result from an empty result due to timing? Please a something like notification/callback in case of http 429

deedy5 commented 1 year ago

It goes something like this:

This package configured to run in single-threaded sequential mode. Do not run in multi-threaded mode to avoid errors !!!

eulenleber commented 1 year ago
  • If the result is empty, it will return []

  • If there is an error, it will return None

Doesnt look like it: https://github.com/deedy5/duckduckgo_search/blob/a68b64a00b45e4ac0f955495f370ff9560a98693/duckduckgo_search/ddg.py#L47-L49

Here you state that the exception is just ignored and an empty list is returned

i.e. in both cases we get []

deedy5 commented 1 year ago

https://github.com/deedy5/duckduckgo_search/blob/a68b64a00b45e4ac0f955495f370ff9560a98693/duckduckgo_search/ddg.py#L88-L100

deedy5 commented 1 year ago

If you send queries with the parameter page=1, 2, 3, 4, etc., the api will return results infinitely. The results will just repeat in random order.

Therefore, to get all results (in this case ddg will return maximum 200 results) parameter max_results is added.

In this case queries are sent to the api in multithreading mode (to speed up) and checked through the cache, to remove duplicates and not to make unnecessary queries. But since api can sometimes return empty answer or error, for compatibility the request will return [] on error.

This is a peculiarity of implementation. I.e. this package is tuned to pull all results as fast as possible.

If you will not use max_results parameter, the api will return results from the first page (page=1). And if you will do it in different threads then you will get errors, because api will block you.

The package will return None if there was an error when receiving the vqd, which clearly indicates that your ip is temporarily blocked.

eulenleber commented 1 year ago

@deedy5 example:

import duckduckgo_search
a=True
i=0
while a:
    a=duckduckgo_search.ddg('"test test"')
    print(f"{i} {len(a)}")
    i+=1

out

0 29
1 29
2 29
3 29
4 29
5 29
6 29
7 28
8 29
9 29
10 29
11 29
12 29
13 29
14 29
15 29
16 29
17 28
18 28
19 28
20 28
21 28
22 29
23 28
24 0

obviously in 24 the request was blocked, but there is no way of knowing that because the result is just an empty list

deedy5 commented 1 year ago

This is something new. The 24th request returns a response with a status of 200, but the body of the response has an invalid json.

window.execDeep=function(){return{is506:1,bn:{ivc:1,ibc:0}};};

I need time to figure it out.