deedy5 / duckduckgo_search

Search for words, documents, images, videos, news, maps and text translation using the DuckDuckGo.com search engine. Downloading files and images to a local hard drive.
MIT License
937 stars 117 forks source link

HTTPError 403 Client Error: Forbidden for url #84

Closed AbdullahAlfaraj closed 1 year ago

AbdullahAlfaraj commented 1 year ago

Describe the bug

Can't seem to get image search feature to work. Here is a colab to reproduce the issue: https://colab.research.google.com/drive/1yAVjMaZxe_eaVJ-J1cvqPbfBcBaQWYvU?usp=sharing

Debug log

WARNING:duckduckgo_search.duckduckgo_search:_get_url() https://duckduckgo.com/i.js HTTPError 403 Client Error: Forbidden for url: https://duckduckgo.com/i.js?l=wt-wt&o=json&s=0&q=butterfly&vqd=4-127388145370558534196193019016331789945&f=%2C%2Ccolor%3AMonochrome%2C%2C%2C&p=-1
WARNING:duckduckgo_search.duckduckgo_search:_get_url() https://duckduckgo.com/i.js HTTPError 403 Client Error: Forbidden for url: https://duckduckgo.com/i.js?l=wt-wt&o=json&s=0&q=butterfly&vqd=4-127388145370558534196193019016331789945&f=%2C%2Ccolor%3AMonochrome%2C%2C%2C&p=-1
WARNING:duckduckgo_search.duckduckgo_search:_get_url() https://duckduckgo.com/i.js HTTPError 403 Client Error: Forbidden for url: https://duckduckgo.com/i.js?l=wt-wt&o=json&s=0&q=butterfly&vqd=4-127388145370558534196193019016331789945&f=%2C%2Ccolor%3AMonochrome%2C%2C%2C&p=-1
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-2-fe9bf968464c> in <cell line: 16>()
     14     license_image=None,
     15 )
---> 16 for r in ddgs_images_gen:
     17     print(r)

3 frames
/usr/local/lib/python3.10/dist-packages/duckduckgo_search/duckduckgo_search.py in images(self, keywords, region, safesearch, timelimit, size, color, type_image, layout, license_image)
    395         cache = set()
    396         for _ in range(10):
--> 397             resp = self._get_url("GET", "https://duckduckgo.com/i.js", params=payload)
    398             if resp is None:
    399                 break

/usr/local/lib/python3.10/dist-packages/duckduckgo_search/duckduckgo_search.py in _get_url(self, method, url, **kwargs)
     69                 logger.warning(f"_get_url() {url} {type(ex).__name__} {ex}")
     70                 if i >= 2 or "418" in str(ex):
---> 71                     raise ex
     72             sleep(3)
     73         return None

/usr/local/lib/python3.10/dist-packages/duckduckgo_search/duckduckgo_search.py in _get_url(self, method, url, **kwargs)
     63                 if self._is_500_in_url(resp.url) or resp.status_code == 202:
     64                     raise requests.HTTPError
---> 65                 resp.raise_for_status()
     66                 if resp.status_code == 200:
     67                     return resp

/usr/local/lib/python3.10/dist-packages/requests/models.py in raise_for_status(self)
   1019 
   1020         if http_error_msg:
-> 1021             raise HTTPError(http_error_msg, response=self)
   1022 
   1023     def close(self):

HTTPError: 403 Client Error: Forbidden for url: https://duckduckgo.com/i.js?l=wt-wt&o=json&s=0&q=butterfly&vqd=4-127388145370558534196193019016331789945&f=%2C%2Ccolor%3AMonochrome%2C%2C%2C&p=-1

Specify this information

deedy5 commented 1 year ago

The site added some kind of protection against bots, which began to block requests. When I release an update, after a while the requests start blocking again. I have now converted the requests to http/2, but how long this will work is a big question. So you can update to v3.6.0 and have a try.

AbdullahAlfaraj commented 1 year ago

@deedy5, Thanks a lot, it worked.