Closed zaneselvans closed 2 years ago
Having a single source of truth for the metadata will help when we do things like enrich the TableSchemas (See #426) with additional constraints, units, etc -- once we've added that information in the One True Place, we'll know that it will propagate to everywhere that information appears.
subsumed within #806
Right now we have a single large metadata library, stored in
src/pudl/package_data/meta/datapkg/datapackage.json
. This collection of metadata contains a large amount of duplicated information, making it difficult to update and maintain, while keeping it self-consistent, and in line with the current state of the source code. Several potential improvements:id
field as the column that contains autoincrementing primary keys, maybe we should just enumerate a list of which tables have autoincrementing keys, and assume it will beid
rather than listingid
a dozen times in theautoincrement
metadata element.enum
constraint, and the contents of that constraint should be specified in a single place and inserted by reference, rather than being enumerated in every separate field. Often times these fixed vocabularies correspond to a list or set of dictionary keys in some item in thepudl.constants
module right now.encoding
,mediatype
,dialect
, andformat
are currently enumerated in every single resource descriptor, but could just be stated in one place and inserted by reference.sources
element which is (I think) overwritten by thepudl.load.metadata
function. This is confusing.