Closed zaneselvans closed 4 years ago
Some of the clean up will happen in Issue #399, but in terms of adding things, I think it would be good to add in keywords
and version
. For keywords, I assume we can have pudl level keywords and dataset level keywords and squish them together according to what datasets are in the package. Should these be stored in constants or in the megadata file?
Anything else?
id
field to store the Zenodo DOI for the package, which can be reserved in advance of publication through the Zenodo API.pudl-bundle-id
since it's not something that can be looked up in any other registry.sources
list, are we actually allowed to have fields other than title
, path
and/or email
? Right now it includes the ETL parameters for each of the sources as a dictionary as well (which I agree is good metadata to include -- just wondering where it should best go)pudl-version
?Need to break this out into several issues:
constants.py
so we can have a conversation about where and how to store the ENUMs and other small data structures, which aren't really code.Okay. The id
and the version
can and will be added via #419 and #426. I'd like to say that all of the ENUM/constants mess should be considered not a part of this issue.
I've added start_date
and end_date
into the sources
... from my understanding and experience, we can add any additional fields into the metadata. The sources
are associated with the data package and with the resources
. I also extracted the start_date
and end_date
from the sources and associated them with a the resources
as well. This may be too much.
Also, worth noting, now the bundle_id_pudl
is the uuid, which is used internally to check if multiple data packages were generated as a part of the same bundle of packages.
If we extract all of these sub-issues, I think this issue should be closed now? @zaneselvans what do you think?
I think having the start_date
and end_date
for the data in the resource only associated with the resource is plenty, and it's usually better to just have one authoritative location.
Other than that, yeah I think all the other stuff is now covered in the other listed issues.
Oh but I do still owe you some keywords for the various data sources.
Along with #399 (removing extraneous / unused metadata) we need to update and clean up the Megadata JSON file, to get rid of all the "idfk" and other placeholders, and fill it up with good information, before we publish it.
Also need to add @gschivley as a contributor.