catalyst-cooperative / pudl-scrapers

Scrapers used to acquire snapshots of raw data inputs for versioned archiving and replicable analysis.

MIT License

3 stars 3 forks source link

Since I've been poking around in the scrapers and it's fresh in my mind, I went ahead and created scrapers for all the old Visual FoxPro DBF databases.

FERC 1: updated to point at the new URLs.
FERC 2: 1991-1999 as DAT files (split into 2 parts per year); 1996-2021Q2 as DBF
FERC 6: 2000-2020 as DBF
FERC 60: 2006-2020 as DBF

I simplified the FERC 1 scraper to only download all the years, rather than allowing one year at a time, and used that pattern for all of the other scrapers as well.

I didn't add tests for all of these basically identical scrapers. Looking at the FERC 1 tests they seemed kind of perfunctory. I'm not sure what the right kinds of tests are for these things... I guess we want to make sure that they actually get all of the available files, and that they keep working with the data out there in the real world. Is that an integration test we should implement? Or is it the nature of the scrapers that we should just be running them for real on a regular basis, and when that usage breaks, we fix it?

Closes #41

Codecov Report

Merging #49 (237f6bb) into main (c2941f9) will decrease coverage by 1.2%. The diff coverage is 54.7%.

@@           Coverage Diff           @@
##            main     #49     +/-   ##
=======================================
- Coverage   64.6%   63.4%   -1.3%     
=======================================
  Files         15      18      +3     
  Lines        555     637     +82     
=======================================
+ Hits         359     404     +45     
- Misses       196     233     +37

Impacted Files	Coverage Δ
src/pudl_scrapers/spiders/ferc2.py	`39.3% <39.3%> (ø)`
src/pudl_scrapers/spiders/ferc6.py	`56.5% <56.5%> (ø)`
src/pudl_scrapers/spiders/ferc60.py	`56.5% <56.5%> (ø)`
src/pudl_scrapers/items.py	`76.0% <76.9%> (+0.3%)`	:arrow_up:
src/pudl_scrapers/spiders/ferc1.py	`73.9% <100.0%> (+10.2%)`	:arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

catalyst-cooperative / pudl-scrapers

Create scrapers for historical FERC VisualFoxPro data #49

Codecov Report