LuChang-CS / news-crawler

A news crawler for BBC News, Reuters and New York Times.
108 stars 40 forks source link

What if there isn't any news after 2017.03? #18

Open YunBAI-PSL opened 2 years ago

YunBAI-PSL commented 2 years ago

Dear Author,

Thanks for your nice job. I run your codes and find there isn't news after 2017.03. But I need some recent news, how do you handle this kind of problem?

Many thanks.

LuChang-CS commented 2 years ago

Hi, thank you for your interests.

Did you change the time range setting in the settings/*.cfg files? Also, you may also need to set a larger sleep time because frequent visits to nytimes from the same IP may trigger their reCAPTCHA verification.

swthinking commented 2 years ago

Even if you set the date in cfg, data cannot be crawled after 2017.

ducnva commented 1 year ago

Maybe name class has changed, so you can not get all link paper. You can check line 31

liyucheng09 commented 1 year ago

Just change line 31 to elements = soup.table.find_all('a') .

Just test, it runs without problem.