catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

PUDL Release v2022.11.30 #2077

Closed zaneselvans closed 1 year ago

zaneselvans commented 1 year ago

As soon as the nightly builds succeed on dev we'll be ready to merge into main and tag a new PUDL release that includes all 2021 data for all of our covered datasets.

Release Checklist / Notes

Since we want to automate the release process I'm trying to catalog everything I do here...

jdangerx commented 1 year ago

@zaneselvans - seems like v2022.11.30 is released, at least on Zenodo. Are we good to merge dev into main, close this issue, etc? Do you still need to do 2i2c stuff?

Also, is this the process you want help streamlining?

zaneselvans commented 1 year ago

@jdangerx yes, this is a big part of the semi-manual release process that needs streamlining. The other big piece which @zschira has been looking at is on the data acquisition end with the pudl-archiver repository and #1418

I'm torn on the JupyterHub. If we're not going to update it, then we should remove it from the documentation. I do think some resource like this is / would be useful. I should just go ahead and update it. It should only take 10 minutes. I just ran out of steam.

jdangerx commented 1 year ago

Tada!

Image

zaneselvans commented 1 year ago

However, I'm pretty sure we still need to upload the new data to the JupyterHub. If you attempt to run the example notebooks in the new Docker container I believe they'll fail, since the data on the hub is from the prior release.

But this should hopefully be much faster & easier now that I can pull it down directly from the S3 bucket without needing to do any authentication.

jdangerx commented 1 year ago

@zaneselvans how do we do that upload?

zaneselvans commented 1 year ago

I usually log in to the JuypterHub, open a terminal within JupyterLab, and download the files from wherever they are on the internet. Historically this has been from Zenodo which has been flaky and slow. But now that we've got the build outputs in a publicly accessible bucket with no authentication required, it should be much faster and easier. Still need to install the AWS CLI on the hub to do recursive downloads. Should probably add that to the Docker container rather than needing to do it manually.

I've got it downloaded to the hub now and am mopping up the old versions, and putting the files in the right places now.

zaneselvans commented 1 year ago

Okay, it's all updated now.