ecprice / newsdiffs

Automatic scraper that tracks changes in news articles over time.
Other
497 stars 135 forks source link

Update for "2020" #59

Open iamvishnurajan opened 4 years ago

iamvishnurajan commented 4 years ago

An item I noticed for the NYT scraper (and probably the CNN one also, although I don't use it). The feeder_pat line needs to be updated for 2020.

As of now it reads: feeder_pat = '^https?://www.nytimes.com/201'

... to catch articles post Jan 1, it needs to be updated to: feeder_pat = '^https?://www.nytimes.com/202'

I've updated this on my fork, but I have at least one other update to the parser that others may or may not want, so I was hesitant to submit the pull request. I wanted to document it here though in case others were wondering why the system isn't catching new articles post Jan 1.

Thanks much, Vishnu

MaxBittker commented 4 years ago

thank you @iamvishnurajan .

I had been totally stumped >.<