bisohns / search-engine-parser

Lightweight package to query popular search engines and scrape for result titles, links and descriptions
https://search-engine-parser.readthedocs.io
445 stars 86 forks source link

TrafficError #143

Open AbirHasan2005 opened 3 years ago

AbirHasan2005 commented 3 years ago

Other Search Engines working properly. But only getting TrafficError while using GoogleSearch() !! I am hosting on Heroku.

Here Error Logs:

2021-04-12T17:34:41.357372+00:00 app[worker.1]: ENGINE FAILURE: Google
2021-04-12T17:34:41.357383+00:00 app[worker.1]: 
2021-04-12T17:34:41.360353+00:00 app[worker.1]: The result parsing was unsuccessful. It is either your query could not be found or it was flagged as unusual traffic
2021-04-12T17:34:41.360354+00:00 app[worker.1]: Traceback (most recent call last):
2021-04-12T17:34:41.360355+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.8/site-packages/pyrogram/dispatcher.py", line 217, in handler_worker
2021-04-12T17:34:41.360355+00:00 app[worker.1]: await handler.callback(self.client, *args)
2021-04-12T17:34:41.360355+00:00 app[worker.1]: File "/app/plugins/inline.py", line 595, in answer
2021-04-12T17:34:41.360356+00:00 app[worker.1]: gresults = await g_search.async_search(gsearch, 1)
2021-04-12T17:34:41.360356+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.8/site-packages/search_engine_parser/core/base.py", line 286, in async_search
2021-04-12T17:34:41.360357+00:00 app[worker.1]: return self.get_results(soup, **kwargs)
2021-04-12T17:34:41.360357+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.8/site-packages/search_engine_parser/core/base.py", line 235, in get_results
2021-04-12T17:34:41.360357+00:00 app[worker.1]: raise NoResultsOrTrafficError(
2021-04-12T17:34:41.360360+00:00 app[worker.1]: search_engine_parser.core.exceptions.NoResultsOrTrafficError: The result parsing was unsuccessful. It is either your query could not be found or it was flagged as unusual traffic

Hope you guys will fix this soon.

deven96 commented 3 years ago

Is it possible that heroku IP's are blocked by Google @MeNsaaH ? Or could something else be at play here

MeNsaaH commented 3 years ago

@AbirHasan2005 can you confirm if this works locally on your machine but doesn't work on the Heroku?

AbirHasan2005 commented 3 years ago

@MeNsaaH It's not working in both local machine & Heroku.

AbirHasan2005 commented 3 years ago

Is it possible that heroku IP's are blocked by Google @MeNsaaH ? Or could something else be at play here

This not possible.

AbirHasan2005 commented 3 years ago

1 month ago it was working properly.

MeNsaaH commented 3 years ago

Okay. Thank you for that info. It seems Google has updated their page. We'll need to update the parser for Google queries

MeNsaaH commented 3 years ago

I'll be looking into this

AbirHasan2005 commented 3 years ago

Okay. Thank you for that info. It seems Google has updated their page. We'll need to update the parser for Google queries

Thanks a lot sir.

New-dev0 commented 3 years ago

Waiting for that πŸ™‚

buddhhu commented 3 years ago

Me too🀧

deven96 commented 3 years ago

Hello @AbirHasan2005. Could you confirm if the latest version fixes this? cc @MeNsaaH

AbirHasan2005 commented 3 years ago

Hello @AbirHasan2005. Could you confirm if the latest version fixes this? cc @MeNsaaH

@deven96 Which sir?

search-engine-parser==0.6.2 ?

deven96 commented 3 years ago

Hello @AbirHasan2005. Could you confirm if the latest version fixes this? cc @MeNsaaH

@deven96 Which sir?

search-engine-parser==0.6.2 ?

Pip install directly from master let's see if it works @AbirHasan2005

pip install git+https://github.com/bisoncorps/search-engine-parser

deven96 commented 3 years ago

Planning a way to make sure versions get the most updated scraping logic whenever page structure changes without having to push a new pypi version. Cc @AbirHasan2005

AbirHasan2005 commented 3 years ago

Planning a way to make sure versions get the most updated scraping logic whenever page structure changes without having to push a new pypi version. Cc @AbirHasan2005

Tried Sir.

But new errors coming:

2021-05-13T16:13:03.816825+00:00 app[worker.1]: name 'proxy_user' is not defined
2021-05-13T16:13:03.816832+00:00 app[worker.1]: Traceback (most recent call last):
2021-05-13T16:13:03.816833+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.9/site-packages/pyrogram/dispatcher.py", line 217, in handler_worker
2021-05-13T16:13:03.816834+00:00 app[worker.1]: await handler.callback(self.client, *args)
2021-05-13T16:13:03.816835+00:00 app[worker.1]: File "/app/plugins/inline.py", line 717, in answer
2021-05-13T16:13:03.816835+00:00 app[worker.1]: gresults = await g_search.async_search(gsearch, 1)
2021-05-13T16:13:03.816836+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 307, in async_search
2021-05-13T16:13:03.816837+00:00 app[worker.1]: soup = await self.get_soup(self.get_search_url(query, page, **kwargs), cache=cache, proxy=proxy, proxy_auth=(proxy_user, proxy_password))
2021-05-13T16:13:03.816838+00:00 app[worker.1]: NameError: name 'proxy_user' is not defined
2021-05-13T16:13:04.739774+00:00 app[worker.1]: name 'proxy_user' is not defined
2021-05-13T16:13:04.739796+00:00 app[worker.1]: Traceback (most recent call last):
2021-05-13T16:13:04.739798+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.9/site-packages/pyrogram/dispatcher.py", line 217, in handler_worker
2021-05-13T16:13:04.739798+00:00 app[worker.1]: await handler.callback(self.client, *args)
2021-05-13T16:13:04.739799+00:00 app[worker.1]: File "/app/plugins/inline.py", line 717, in answer
2021-05-13T16:13:04.739800+00:00 app[worker.1]: gresults = await g_search.async_search(gsearch, 1)
2021-05-13T16:13:04.739801+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 307, in async_search
2021-05-13T16:13:04.739802+00:00 app[worker.1]: soup = await self.get_soup(self.get_search_url(query, page, **kwargs), cache=cache, proxy=proxy, proxy_auth=(proxy_user, proxy_password))
2021-05-13T16:13:04.739803+00:00 app[worker.1]: NameError: name 'proxy_user' is not defined

In new version I have to change code codes? Any parameter changed?

deven96 commented 3 years ago

Sorry it seemed the async search had been faulty with the addition of proxy. Could you try again @AbirHasan2005

AbirHasan2005 commented 3 years ago

Sorry it seemed the async search had been faulty with the addition of proxy. Could you try again @AbirHasan2005

Sorry Sir @deven96, Same NoResultsOrTrafficError Coming.

Errors Here.

deven96 commented 3 years ago

Sorry it seemed the async search had been faulty with the addition of proxy. Could you try again @AbirHasan2005

Sorry Sir @deven96, Same NoResultsOrTrafficError Coming.

Errors Here.

Seems to work locally , could you try to replicate locally and not heroku so we can narrow it down?

AbirHasan2005 commented 3 years ago

Sorry it seemed the async search had been faulty with the addition of proxy. Could you try again @AbirHasan2005

Sorry Sir @deven96, Same NoResultsOrTrafficError Coming. Errors Here.

Seems to work locally , could you try to replicate locally and not heroku so we can narrow it down?

Yes Sir. It works well locally. Tested on Windows 10.

So why not working for Heroku??

deven96 commented 3 years ago

NoResultsorTraffic typically means the structure of the page received is not for scraping, e.g captcha pages Should we insert some debug statements to view the html actually being retrieved, therein lies our answer @AbirHasan2005

AbirHasan2005 commented 3 years ago

Before it was working on Heroku. Don't know suddenly what happened. Any suggestions by you to fix this??

deven96 commented 3 years ago

Is it narrowed down to that particular app or any heroku app?

AbirHasan2005 commented 3 years ago

Is it narrowed down to that particular app or any heroku app?

Same issue for all Heroku apps.

This is my friend's issue: #142

He also getting NoResultsOrTrafficError ...

AbirHasan2005 commented 3 years ago

Not only him. Everyone getting same issue who running on Heroku.

buddhhu commented 3 years ago

Maybe this will help you


  File "/usr/local/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 241, in get_results
    search_results = self.parse_result(results, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 150, in parse_result
    rdict = self.parse_single_result(each, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/search_engine_parser/core/engines/google.py", line 77, in parse_single_result
    title = link_tag.find('h3').text
AttributeError: 'NoneType' object has no attribute 'text'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/TeamUltroid/plugins/devtools.py", line 132, in _
    await aexec(cmd, event)
  File "/root/TeamUltroid/plugins/devtools.py", line 181, in aexec
    return await locals()["__aexec"](event, event.client)
  File "<string>", line 8, in __aexec
  File "/usr/local/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 287, in async_search
    return self.get_results(soup, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 244, in get_results
    raise NoResultsOrTrafficError(
search_engine_parser.core.exceptions.NoResultsOrTrafficError: The returned results could not be parsed. This might be due to site updates or server errors. Drop an issue at https://github.com/bisoncorps/search-engine-parser if this persists```
buddhhu commented 3 years ago

Maybe this will help you

  File "/usr/local/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 241, in get_results
    search_results = self.parse_result(results, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 150, in parse_result
    rdict = self.parse_single_result(each, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/search_engine_parser/core/engines/google.py", line 77, in parse_single_result
    title = link_tag.find('h3').text
AttributeError: 'NoneType' object has no attribute 'text'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/TeamUltroid/plugins/devtools.py", line 132, in _
    await aexec(cmd, event)
  File "/root/TeamUltroid/plugins/devtools.py", line 181, in aexec
    return await locals()["__aexec"](event, event.client)
  File "<string>", line 8, in __aexec
  File "/usr/local/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 287, in async_search
    return self.get_results(soup, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/search_engine_parser/core/base.py", line 244, in get_results
    raise NoResultsOrTrafficError(
search_engine_parser.core.exceptions.NoResultsOrTrafficError: The returned results could not be parsed. This might be due to site updates or server errors. Drop an issue at https://github.com/bisoncorps/search-engine-parser if this persists```

@MeNsaaH check

itzzzyashu commented 2 years ago

@AbirHasan2005 can you confirm if this works locally on your machine but doesn't work on the Heroku?

Sir in a Telegram Userbot it's working fine but in a Telegram bot it's showing this error (on Heroku) it sometimes show that error (depend on code)

async def _(event):
    if event.fwd_from:
        return
    webevent = await event.reply("searching........")
    match = event.pattern_match.group(1)
    page = re.findall(r"page=\d+", match)
    try:
        page = page[0]
        page = page.replace("page=", "")
        match = match.replace("page=" + page[0], "")
    except IndexError:
        page = 1
    search_args = (str(match), int(page))
    gsearch = GoogleSearch()
    gresults = await gsearch.async_search(*search_args)
    msg = ""
    for i in range(len(gresults["links"])):
        try:
            title = gresults["titles"][i]
            link = gresults["links"][i]
            desc = gresults["descriptions"][i]
            msg += f"❍[{title}]({link})\n**{desc}**\n\n"
        except IndexError:
            break
    await webevent.edit(
        "**Search Query:**\n`" + match + "`\n\n**Results:**\n" + msg, link_preview=False
    )

I used this code in bot which showed this error.

itzzzyashu commented 2 years ago

Not only him. Everyone getting same issue who running on Heroku.

why it's working fine in a telegram userbot? (which is also deployed on heroku.)

MeNsaaH commented 2 years ago

It seems there's an issue with heroku blacklisting requests to google.

AbirHasan2005 commented 2 years ago

It seems there's an issue with heroku blacklisting requests to google.

Yes. It's working on replit.com πŸ™ƒ

iamgojoof6eyes commented 1 year ago

It seems there's an issue with heroku blacklisting requests to google.

But google search is not working on my VPS either it keeps throwing the error NoResultsOrTrafficError, however stackoverflow and myanimelist search working just fine.....but google search keeps throwing the same error.