alan-turing-institute / misinformation-crawler

Web crawler to collect snapshots of articles to web archive
MIT License
5 stars 2 forks source link

vox.com extraction issues #331

Closed jemrobinson closed 5 years ago

jemrobinson commented 5 years ago

No articles being extracted. Errors are of the following form:

2019-07-24 11:33:50  WARNING: No elements could be found from https://www.vox.com/first-person/2019/5/17/18629233/alabama-missouri-abortion-ban-2019 matching //div[@class="c-entry-content"] expected by match_rule 'single'. Returning None.
2019-07-24 11:33:50     INFO:   no article found for: https://www.vox.com/first-person/2019/5/17/18629233/alabama-missouri-abortion-ban-2019