bisohns / search-engine-parser

Lightweight package to query popular search engines and scrape for result titles, links and descriptions
https://search-engine-parser.readthedocs.io
449 stars 86 forks source link

Bing search is broken #171

Open bentsi opened 2 years ago

bentsi commented 2 years ago

Describe the bug Running simple code (based on the Readme)

getting:

ENGINE FAILURE: Bing
Traceback (most recent call last):
  File "/home/bentsi/pycharm-community-2021.2/plugins/python-ce/helpers/pydev/pydevd.py", line 1483, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/bentsi/pycharm-community-2021.2/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/bentsi/devel/continueai/backend/src/scraping/search_engine_query.py", line 17, in <module>
    bresults = bsearch.search(**search_args)
  File "/home/bentsi/.pyenv/versions/cai-backend/lib/python3.10/site-packages/search_engine_parser/core/base.py", line 288, in search
    return self.get_results(soup, **kwargs)
  File "/home/bentsi/.pyenv/versions/cai-backend/lib/python3.10/site-packages/search_engine_parser/core/base.py", line 247, in get_results
    raise NoResultsOrTrafficError(
search_engine_parser.core.exceptions.NoResultsOrTrafficError: The result parsing was unsuccessful. It is either your query could not be found or it was flagged as unusual traffic

after digging into the root cause I found following: 1) http request to Bing returns response with HTML without results image 2) after adding a cookie that Google Chrome adds to GET headers, the code starts working image

So the solution is to add cookie data, but I am not sure what exactly should be added, since cookie looks sophisticated.

To Reproduce

from search_engine_parser.core.engines.bing import Search as BingSearch
company_name = "samsung electronics corp official website"

search_args = {"query": company_name, "page": 1}
bsearch = BingSearch()
bsearch.clear_cache()
bresults = bsearch.search(**search_args)

Expected behavior Search returns results Screenshots

Desktop (please complete the following information):

bentsi commented 2 years ago

succeeded to find the correct cookie, but now getting results parsing issue:

Traceback (most recent call last):
  File "/home/bentsi/.pyenv/versions/cai-backend/lib/python3.10/site-packages/search_engine_parser/core/base.py", line 252, in get_results
    search_results = self.parse_result(results, **kwargs)
  File "/home/bentsi/.pyenv/versions/cai-backend/lib/python3.10/site-packages/search_engine_parser/core/base.py", line 151, in parse_result
    rdict = self.parse_single_result(each, **kwargs)
  File "/home/bentsi/.pyenv/versions/cai-backend/lib/python3.10/site-packages/search_engine_parser/core/engines/bing.py", line 68, in parse_single_result
    rdict["descriptions"] = desc.text
AttributeError: 'NoneType' object has no attribute 'text'

will work on a fix

deven96 commented 2 years ago

Thanks for the detailed investigation and working on a fix