Iceloof / GoogleNews

Script for GoogleNews
https://pypi.org/project/GoogleNews/
MIT License
314 stars 88 forks source link

get_news() does not take into account the date range #99

Open PashaM999 opened 1 year ago

PashaM999 commented 1 year ago

Hi, I have been looking through the code and found that the function get_news has the url generated as follows:

self.url = 'https://news.google.com/search?q={}+when:{}&hl={}'.format(key,self.__period,self.__lang.lower())

(line 259 in __init__.py)

This uses the self.__period variable, which is only responsible for periods like 7d 1m and etc. Google has a search filter to use for specific dates, which can be implemented into your code as follows:

start = f'{self.__start[-4:]}-{self.__start[:2]}-{self.__start[3:5]}'
end = f'{self.__end[-4:]}-{self.__end[:2]}-{self.__end[3:5]}'
self.url = 'https://news.google.com/search?q={}+before:{}+after:{}&hl={}'.format(key,end, start, self.__lang.lower())

This is merely a suggestion, but I feel like if the __start and __end variables are set, you sould prioritize this over your original solution.

Hope that will be useful for someone :)

HurinHu commented 1 year ago

Time period is not always working with Google, sometimes Google will return data out of specific range. Anyway, start/end filter might be able to add, will update it in next release.

guibolla commented 1 year ago

I came across the same issue, PashaM999's solution seems to work.

zxdawn commented 9 months ago

Nice solution! Any plan to fix it?

HurinHu commented 9 months ago

Nice solution! Any plan to fix it?

It's from Google, we can't do anything about it.

zxdawn commented 9 months ago

Em ... I have tried to add before:{}+after:{} like @PashaM999 did and it works well for my case. Maybe we can add the function and mention that Google sometimes returns data out of a specific range in the README file?

matanton commented 5 months ago

I am trying this on 1.6.12 version. after my code (shown below) was only returning the recent news.

import pandas as pd
from GoogleNews import GoogleNews
#googlenews = GoogleNews()
googlenews = GoogleNews()
googlenews.clear()
googlenews = GoogleNews(start='01/01/2021',end='12/31/2021')

#googlenews = GoogleNews(lang='pt', region='BR')
googlenews.set_lang('pt')
googlenews.set_time_range('01/01/2021','12/31/2021')

I am editing now init.py (seems the self.url line is now on 273) Just copy and paste the @PashaM999 solution on init.py file over the 273rd line? Where do I set the start and end? on my own code?

hanskwan commented 1 month ago

Is it possible to include @PashaM999 in future updates? That would be helpful