catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 105 forks source link

Expand descriptions of our most important tables #1908

Open zaneselvans opened 1 year ago

zaneselvans commented 1 year ago

We have a lot of intimate knowledge about the data we're producing, and we should share more of it in the table-level descriptions, so that users and collaborators can understand what the data is, and how to use it appropriately. Tables that could use more details include:

Data Tables

Entity Tables

Structural Tables

Codes & Metadata

Note: This documentation effort was partly inspired by a request for more context from the folks at CarbonPlan.

cmgosnell commented 1 year ago

for boiler_generator_assn_eia860 can we make the pudl.transform.eia._boiler_generator_assn a public function and link to this documentation?

zaneselvans commented 1 year ago

Do you mean add a link to the documentation of the _boiler_generator_assn() from within the description of the table? We could definitely do that, but it would only show up correctly in the RTD documentation where it would get parsed as RST. In the Datasette metadata and elsewhere it would be garbled, unless we added some RST parsing to the metadata export step for other formats.

We could also just use a bare URL. I think that would get turned into a link in RST and it would be obvious what it was in other contexts.

cmgosnell commented 1 year ago

yea I'm suggesting linking them one way or another. I've definitely been thinking about it in the context of the RTD, but fair point about other formats.

I just don't want to have duplicate information about these tables in many places (read hard to update). and it is much easier for us to maintain these things if the docs are close to the code.

zaneselvans commented 1 year ago

Definitely agree! It might not be hard to run the string through some rst-to-html function that already exists. Or we could add metadata indicating source functions. Or this is also a kind of information that Dagster should help us track and display.

bendnorman commented 1 year ago

I noticed the utility_plant_assn table does not have a description in our data dictionary. Should I create a new issue and PR to add a basic description for the table?