ClimateMisinformation / Scrapers

Web scrapers
5 stars 1 forks source link

Create container climatediscussionnexus.com #32

Closed ricjhill closed 3 years ago

ricjhill commented 3 years ago

This is huge. Sorry. I have added a README to describe how to run the containers infrastructure/docker/README.md. This document and the git repo should be enough to collect data for the sites containerisd.

Created/updated scrapers with containers for 7 sites. There is a lot of repetition in infrastructure/docker/ sub dirs. I plan to containerise the scripts we have then look for commonalities. The article HTML structures are inconsistent between each other and sometimes internally between different articles. Ideally, we can investigate what needs to be done to extract the content we want from our targets then push it upstream to https://pypi.org/project/newspaper3k/ .