ecprice / newsdiffs

Automatic scraper that tracks changes in news articles over time.
Other
497 stars 135 forks source link

NYT scraper no longer works #50

Open iamvishnurajan opened 6 years ago

iamvishnurajan commented 6 years ago

I noticed on/about May 8 2018, the NYT scraper no longer seems to work. I'm not so great with digesting HTML and Python, but it looks like the way NYT articles are encoded and how the different fields are tagged has changed. I will try to play with this and see if I can figure it out, but if anyone has any expertise here, any assistance would be much appreciated.

ecprice commented 6 years ago

Thanks! You should take a look at pull request #49, which takes a stab at it but isn't quite right for all articles.

iamvishnurajan commented 6 years ago

Thank you much! I had not seen that - will check it out now.

iamvishnurajan commented 6 years ago

I was able to work off of pull request #49 and create a NYT parser that seems to work for me. The pull request off of #49 is here https://github.com/carlgieringer/newsdiffs/pull/1.