catalyst-cooperative / pudl-scrapers

Scrapers used to acquire snapshots of raw data inputs for versioned archiving and replicable analysis.
MIT License
3 stars 3 forks source link

scrape epacems crosswalk #15

Closed cmgosnell closed 2 years ago

cmgosnell commented 2 years ago

We need to get the EPA CEMS/CAMD to EIA crosswalk table integrated into our system in a durable way. The first step is scraping it and archiving it on Zenodo.

The files are currently published by EPA in this GitHub repo in a mix of CSV and XLSX files. We haven't scraped from GitHub before, so we'll need to figure out how to do that right.

Once we have Zenodo archives that store these files in a data package, we can work on integrating them into the ETL and database structure.