bisohns / search-engine-parser

Lightweight package to query popular search engines and scrape for result titles, links and descriptions
https://search-engine-parser.readthedocs.io
450 stars 86 forks source link

google cannot parse "Tallest mountain in the world" #149

Open fanzhuyifan opened 3 years ago

fanzhuyifan commented 3 years ago

Description The google engine cannot parse the return results of "Tallest mountain in the world"

To Reproduce Steps to reproduce the behavior:

from search_engine_parser.core.engines.google import Search
searcher = Search()
results = searcher.search("Tallest mountain in the world")

Expected behavior Correctly parsed results

Screenshots

Traceback (most recent call last):
  File "XXXXX/.conda/envs/info/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 240, in get_results
    search_results = self.parse_result(results, **kwargs)
  File "XXXXX/.conda/envs/info/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 151, in parse_result
    rdict = self.parse_single_result(each, **kwargs)
  File "XXXXX/.conda/envs/info/lib/python3.9/site-packages/search_engine_parser/core/engines/google.py", line 74, in parse_single_result
    title = r_elem.find('div', class_='BNeawe').text
AttributeError: 'NoneType' object has no attribute 'text'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "XXXXX/temp.py", line 4, in <module>
    results = searcher.search("Tallest mountain in the world")
  File "XXXXX/.conda/envs/info/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 270, in search
    return self.get_results(soup, **kwargs)
  File "XXXXX/.conda/envs/info/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 243, in get_results
    raise NoResultsOrTrafficError(
search_engine_parser.core.exceptions.NoResultsOrTrafficError: The returned results could not be parsed. This might be due to site updates or server errors. Drop an issue at https://github.com/bisoncorps/search-engine-parser if this persists

Desktop (please complete the following information):

Additional context The result that cannot be parsed:

<div class="ZINbbc xpd O9g5cc uUPGi"><div><div class="kCrYT"><a href="/url?q=https://www.infoplease.com/world/geography/top-ten-worlds-highest-mountains&amp;sa=U&amp;ved=2ahUKEwih5sjUusjxAhWPFjQIHbqKDhEQFnoECAoQCw&amp;usg=AOvVaw1pflhmM0gRBSRK5KlKcTT6"><span></span></a></div><div class="CgE3Ac I9mEQ"><table class="LnMnt"><thead><tr><td class="IxZjcf sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe uEec3 AP7Wnd">Rank</div></div></td><td class="IxZjcf sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe uEec3 AP7Wnd">Mountain</div></div></td><td class="IxZjcf sjsZvd s5aIid OE1use"><div class="hfgVwf"><div class="BNeawe uEec3 AP7Wnd">Country</div></div></td></tr></thead><tbody><tr><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">1.</div></div></td><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">Everest</div></div></td><td class="sjsZvd s5aIid OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">Nepal/Tibet</div></div></td></tr><tr><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">2.</div></div></td><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">K2 (Mount Godwin Austen)</div></div></td><td class="sjsZvd s5aIid OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">Pakistan/China</div></div></td></tr><tr><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">3.</div></div></td><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">Kangchenjunga</div></div></td><td class="sjsZvd s5aIid OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">India/Nepal</div></div></td></tr><tr><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">4.</div></div></td><td class="sjsZvd OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">Lhotse</div></div></td><td class="sjsZvd s5aIid OE1use"><div class="hfgVwf"><div class="BNeawe s3v9rd AP7Wnd">Nepal/Tibet</div></div></td></tr></tbody></table></div><div class="hwc"><div class="Q0HXG"></div><div class="kCrYT"><a href="/url?q=https://www.infoplease.com/world/geography/top-ten-worlds-highest-mountains&amp;sa=U&amp;ved=2ahUKEwih5sjUusjxAhWPFjQIHbqKDhEQFnoECAoQDA&amp;usg=AOvVaw39wAm-G8SzoUzVMu-r2DX6"><div><span><div class="BNeawe vvjwJb AP7Wnd">The Top Ten: The World's Highest Mountains - Infoplease</div></span><span><div class="BNeawe UPmit AP7Wnd">www.infoplease.com &gt; world &gt; geography &gt; top-ten-worlds-highest-mount...</div></span></div></a></div></div></div></div>

The corresponding result of https://github.com/bisoncorps/search-engine-parser/blob/0418867b3529980d5a4eb71899dec37092fe7df1/search_engine_parser/core/engines/google.py#L66

[<div class="kCrYT"><a href="/url?q=https://www.infoplease.com/world/geography/top-ten-worlds-highest-mountains&amp;sa=U&amp;ved=2ahUKEwih5sjUusjxAhWPFjQIHbqKDhEQFnoECAoQCw&amp;usg=AOvVaw1pflhmM0gRBSRK5KlKcTT6"><span></span></a></div>,
 <div class="kCrYT"><a href="/url?q=https://www.infoplease.com/world/geography/top-ten-worlds-highest-mountains&amp;sa=U&amp;ved=2ahUKEwih5sjUusjxAhWPFjQIHbqKDhEQFnoECAoQDA&amp;usg=AOvVaw39wAm-G8SzoUzVMu-r2DX6"><div><span><div class="BNeawe vvjwJb AP7Wnd">The Top Ten: The World's Highest Mountains - Infoplease</div></span><span><div class="BNeawe UPmit AP7Wnd">www.infoplease.com &gt; world &gt; geography &gt; top-ten-worlds-highest-mount...</div></span></div></a></div>]

The first div does not contain the title.

MeNsaaH commented 3 years ago

Are you running this on heroku?

KennBro commented 2 years ago

I have version 0.6.6 installed and I get the same error. And I am not running on heroku.

GuyKh commented 2 years ago

Same error on various search queries

MeNsaaH commented 2 years ago

Is this on Heroku?

icc-sundar commented 2 years ago

I am getting the same error on various search queries. I also tried running this locally and not on Heroku, but it is still not working.

GigglePocket commented 2 years ago

I am also receiving the same exceptions for all but a few of the simplest single-word search terms.

Specs

Other

bentsi commented 2 years ago

168 should fix it