catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
471 stars 108 forks source link

Direct big datasette downloads to AWS Open Data Instead #2464

Open zaneselvans opened 1 year ago

zaneselvans commented 1 year ago

Given that we've just tripled the size of the PUDL DB, we may want to be a little forceful about directing folks to do bulk downloads from the AWS Open Data Catalog. Some clear options:

- [ ] Remove direct DB download or point at AWS instead.
- [ ] Disable CSV streaming.
- [ ] Add prominent links to AWS on each DB index page / top level index.
ggurjar333 commented 11 months ago

Can I TAKE this?

bendnorman commented 10 months ago

Yes! I think this could be a good first issue. It looks like the links to download the full databases have been removed.

To fix this issue I think you'll need to:

  1. Turn off CSV streaming by adding allow_csv_stream off to the datasette command in the devtools/datasette/fly/run.sh script.
  2. Add s3 download links to the database descriptions. To update the descriptions, edit the src/pudl/metadata/templates/datasette-metadata.yml.