Add `retrieve_pudl` snakemake module

PyPSA / pypsa-usa

PyPSA-USA: An Open-Source Energy System Optimization Model for the United States

https://pypsa-usa.readthedocs.io

MIT License

58 stars 23 forks source link

Add `retrieve_pudl` snakemake module #311

Closed jpvelez closed 5 months ago

jpvelez commented 6 months ago

Feature Request

We need to add a new snakemake task that download the PUDL database.

Suggested Solution

[x] Identify which build of the PUDL database to use, and grab url
[x] Write a new module in workflow_scripts named retrieve_pudl.py that downloads PUDL
[x] Write a new rule in retrieve.smk named retrieve_pudl that calls retrieve_pudl.py

ktehranchi commented 6 months ago

Identify which build of the PUDL database to use, and grab url

I think the stable-builds from either Zenodo or the AWS buckets would work well for our purposes.

https://catalystcoop-pudl.readthedocs.io/en/latest/data_access.html#stable-builds

stephendeyoung commented 6 months ago

@jpvelez I've started this here: https://github.com/stephendeyoung/pypsa-usa/commit/5978f5d9b7db3009fd3892b4791b6d284f4680c0

I didn't have time to check the contribution guidelines so didn't create a PR yet.

I wasn't clear on adding retrieve_pudl rule to Snakefile. The other retrieve_*** scripts are not being included in the Snakefile and I can see that retrieve.smk is being included in the Snakefile. Can you clarify?

ktehranchi commented 6 months ago

@stephendeyoung This looks great, thanks! Our contribution guide is out of date (#319) ... but you can submit the PR to the develop branch.

RE: how to add retrieve_pudl to snakemake workflow- you have done it correctly. The new rule will be added to the snakemake because retrieve.smk is added in the Snakefile

stephendeyoung commented 6 months ago

Thanks @ktehranchi. I've created the PR now. The PUDL db is still gzipped after the download. LMK if it needs to be uncompressed.

ktehranchi commented 6 months ago

Yep- should be uncompressed. Thank you.

stephendeyoung commented 6 months ago

Ok that's done. It was a little more complex than anticipated because I had to decompress the file chunk by chunk (requests will do this automatically if the correct headers are set in the response but that wasn't happening in this case).

ktehranchi commented 6 months ago

Awesome, looks good! I will merge the PR, thank you!

We should be ready for #312 now.