Open alexn11 opened 3 years ago
This scraping script works on most data sources. The output is standardized
Just build the docker file and have look. The differences between the datasources we can fix in the filter function for each datasource and by custom extractions from the HTML . The output is standardized
Some of the scrapers have different columns (the-bfd.py, cato-institute.py, co2-coalition.py) or missing source column (bbc-non-climate, breibart-defense, the-onion-politics). If these are to be used again, should change them (and remove the scripts in the normalizer directory which are intended to correct that).