Closed jpvelez closed 5 months ago
Identify which build of the PUDL database to use, and grab url
I think the stable-builds from either Zenodo or the AWS buckets would work well for our purposes.
https://catalystcoop-pudl.readthedocs.io/en/latest/data_access.html#stable-builds
@jpvelez I've started this here: https://github.com/stephendeyoung/pypsa-usa/commit/5978f5d9b7db3009fd3892b4791b6d284f4680c0
I didn't have time to check the contribution guidelines so didn't create a PR yet.
I wasn't clear on adding retrieve_pudl
rule to Snakefile
. The other retrieve_***
scripts are not being included in the Snakefile and I can see that retrieve.smk
is being included in the Snakefile. Can you clarify?
@stephendeyoung This looks great, thanks! Our contribution guide is out of date (#319) ... but you can submit the PR to the develop branch.
RE: how to add retrieve_pudl
to snakemake workflow- you have done it correctly. The new rule will be added to the snakemake because retrieve.smk
is added in the Snakefile
Thanks @ktehranchi. I've created the PR now. The PUDL db is still gzipped after the download. LMK if it needs to be uncompressed.
Yep- should be uncompressed. Thank you.
Ok that's done. It was a little more complex than anticipated because I had to decompress the file chunk by chunk (requests
will do this automatically if the correct headers are set in the response but that wasn't happening in this case).
Awesome, looks good! I will merge the PR, thank you!
We should be ready for #312 now.
Feature Request
We need to add a new snakemake task that download the PUDL database.
Suggested Solution
workflow_scripts
namedretrieve_pudl.py
that downloads PUDLretrieve_pudl
that callsretrieve_pudl.py