ClimateMisinformation / Scrapers

Web scrapers
5 stars 1 forks source link

uniformize older scrapers #18

Open alexn11 opened 3 years ago

alexn11 commented 3 years ago

Some of the scrapers have different columns (the-bfd.py, cato-institute.py, co2-coalition.py) or missing source column (bbc-non-climate, breibart-defense, the-onion-politics). If these are to be used again, should change them (and remove the scripts in the normalizer directory which are intended to correct that).

ricjhill commented 3 years ago

This scraping script works on most data sources. The output is standardized

Just build the docker file and have look. The differences between the datasources we can fix in the filter function for each datasource and by custom extractions from the HTML . The output is standardized

https://github.com/ClimateMisinformation/Scrapers/tree/create-container-climatediscussionnexus.com/infrastructure/docker/climatediscussionnexus-scrape