catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
471 stars 108 forks source link

Docs request: datapackage details onto readthedocs #564

Closed karldw closed 3 years ago

karldw commented 4 years ago

Is your feature request related to a problem? Please describe. When I'm hunting for a particular variable, I'm often unsure which dataset it's in, or if PUDL has a cleaned version.

Describe the solution you'd like It would be great to have the readthedocs site describe the tables and variables that are present in the current zenodo release. Things like table names, variable names, and years covered would be really useful. It seems like you've already done the hard part by creating writing the computer-readable datapackage.json descriptions. (I'm not familiar with the details, so I'm sure it's more difficult than I realize.)

Describe alternatives you've considered Just download the datapackage.json file and search.

Additional context This issue particularly comes up for FERC form 1, where some tables are left uncleaned.

zaneselvans commented 4 years ago

This is definitely doable and something I've been thinking about. A couple of examples of data package metadata posted on the web in a more readable form:

We've also been thinking that making RTD the authoritative documentation for what data we have available, how much of it, any notes on how it was processed / quirks, etc, with like one dataset per page in the data catalog, plus a shopping list of data that we want to bring in with explanations of why it would be useful / important what one might be able to do with it, how hard it would be to integrate, etc.

cmgosnell commented 3 years ago

this is done now with our data dictionaries.