ClimateMisinformation / Scrapers

Web scrapers
5 stars 1 forks source link

Dailymailscrapper #5

Closed ricjhill closed 3 years ago

ricjhill commented 3 years ago

This scraper scrapes the daily mail test site. You need to run it from the root of the repo like the other scripts. If successful it creates a CSV file in data/ dir. I have added very basic error handling. The date is in the format 12/9/2020 Month/Day/Year. I assume this will be used for all scrapers. Any date format is fine but consistency helps us be more efficient. If no "author" is listed in the article the author field is blank. The tag used is ''neutral". That should change. We could add a README describing suitable tags. It can pull about 100 articles per day but the CSV file is rewritten every time. We can clean and push the CSV files to storage daily but I'd like you thought on it before implementing something.