Iceloof / GoogleNews

Script for GoogleNews
https://pypi.org/project/GoogleNews/
MIT License
316 stars 88 forks source link

Search Return Empty List, no Matter What I Search #45

Closed dr-alberto closed 3 years ago

dr-alberto commented 3 years ago

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

On my local machine this doesn't happen, I'm facing this issue in a US server.

>>> from GoogleNews import GoogleNews
>>> api = GoogleNews()
>>> api.setlang('en')
>>> api.setperiod('d')
>>> api.setencode('utf-8')
>>> api.search('Bitcoin')
>>> api.result()
[]

And if I try to clear the result and search other stuff, I still get the same empty list:

>>> api.clear()
>>> api.search('Amazon')
>>> api.result()
[]
>>> api.clear()
>>> api.search('APPLE')
>>> api.result()
[]

But if I search for news using get_news(), I'm currently getting the news as usual:

>>> api.clear()                                                                                                                                                                                                                               
>>> api.get_news('Tesla')                                                                                                                                                                                                                     
>>> api.results()                                                                                                                                                                                                                             
    [{'title': "Kelly Evans: We're all buying Tesla at the highs", 'desc': "My polite term for Tesla's valuation right now is “insane.” This is a company worth more than $550 billion. That's not only a staggering sum--making it the biggest ..
    .", 'date': '30 days ago', 'datetime': datetime.datetime(2020, 12, 1, 20, 2, 27), 'link': 'news.google.com/./articles/CAIiEFYt9BkrzON5lyF3cHkseAsqGQgEKhAIACoHCAow2Nb3CjDivdcCMJ_d7gU?hl=en-US&gl=US&ceid=US%3Aen', 'img': 'https://lh3.google
    usercontent.com/g3nUNRA_3mBrV6g866OaK6tmy-ageNNgBKmnW86A3RVcBjxjd4-V-jZ6AdMrB15IVmJI0-50H8cWAiGLxNc=-p-df-h100-w100', 'media': None, 'site': 'CNBC'}
    ....]

Context (Environment)

I'm using GoogleNews from a virtualenv, this is what I have tried:

Detailed Description

I would like to understand where the issue is happening and fix it on Python 3.6 and 3.8. It will be great if we can find a solution for this issue happening only on the search() function.

HurinHu commented 3 years ago

This is a common issue, Google is not allowed robots, and some of the server/IP will be recognized as robot, that is why it returns empty result. The method search() and get_news() are using different google url to fetch the result, probably Google use different strategies for different purposes. My advice is to avoid running on the cloud server and set some delays between requests.

dr-alberto commented 3 years ago

This is a common issue, Google is not allowed robots, and some of the server/IP will be recognized as robot, that is why it returns empty result. The method search() and get_news() are using different google url to fetch the result, probably Google use different strategies for different purposes. My advice is to avoid running on the cloud server and set some delays between requests.

Thanks for your response @HurinHu Unfortunatelly I need to use this server for my purposes, and the thing is that from the very begining I wasn't able to use the search() method, so I'm not sure why Google has blocked my IP but setting a delay is not working in my case. This issue is new for me, if you could refer me to some post talking about the topic with possible solutions, that will be much appreciated.

HurinHu commented 3 years ago

This is a common issue, Google is not allowed robots, and some of the server/IP will be recognized as robot, that is why it returns empty result. The method search() and get_news() are using different google url to fetch the result, probably Google use different strategies for different purposes. My advice is to avoid running on the cloud server and set some delays between requests.

Thanks for your response @HurinHu Unfortunatelly I need to use this server for my purposes, and the thing is that from the very begining I wasn't able to use the search() method, so I'm not sure why Google has blocked my IP but setting a delay is not working in my case. This issue is new for me, if you could refer me to some post talking about the topic with possible solutions, that will be much appreciated.

One of the possible solution is run behind proxy which is not block by Google, but probably it is not stable. Or you might try other search engine, like Bing or Yahoo.

dr-alberto commented 3 years ago

I will try that, definitely it seems to be the issue, when I try curl -v https://google.com I get 302 Redirect Error, so probably Google is somehow blocking my IP. I will try with other search engines as you said but at the moment I can still use get_news() so I don't have to start from zero.