An item I noticed for the NYT scraper (and probably the CNN one also, although I don't use it). The feeder_pat line needs to be updated for 2020.
As of now it reads:
feeder_pat = '^https?://www.nytimes.com/201'
... to catch articles post Jan 1, it needs to be updated to:
feeder_pat = '^https?://www.nytimes.com/202'
I've updated this on my fork, but I have at least one other update to the parser that others may or may not want, so I was hesitant to submit the pull request. I wanted to document it here though in case others were wondering why the system isn't catching new articles post Jan 1.
An item I noticed for the NYT scraper (and probably the CNN one also, although I don't use it). The feeder_pat line needs to be updated for 2020.
As of now it reads: feeder_pat = '^https?://www.nytimes.com/201'
... to catch articles post Jan 1, it needs to be updated to: feeder_pat = '^https?://www.nytimes.com/202'
I've updated this on my fork, but I have at least one other update to the parser that others may or may not want, so I was hesitant to submit the pull request. I wanted to document it here though in case others were wondering why the system isn't catching new articles post Jan 1.
Thanks much, Vishnu