Closed zaneselvans closed 4 years ago
The whole reason these little 2-column abbr
tables exist is to map an abbreviation to a full name. But that's dumb. We should just store the full, human readable name, and do away with the abbreviation entirely. So the need for the table will go away, and be replaced with a translation of the reported codes to human readable full names in the transform step, with the resulting fixed list of acceptable values making up an ENUM type. This is probably something that should be done in conjunction with the move from the DB to the data packages. @cmgosnell how/when do you think we should deal with this? Modify the database structure and transform step now, so that the metadata extraction for the data packages is correct? Or wait until after the transition, and then simplify the JSON TableSchema by hand in conjunction with altering the transform step?
It does seem like there's going to be an issue of duplicated data in here somewhere -- in that there are several fields which exist across various tables which should be subject to the same ENUM
constraints. How do we keep from having to update every last one of them by hand whenever the list of acceptable values changes?
The following tables and fields currently lack metadata. The
abbr
ones should probably be turned into readable descriptive tags that are part of anENUM
(contianing whatever is in the value field that they are referring to), and ultimately stripped out of the database structure entirely.boiler_generator_assn_eia860
fuel_type_eia923
fuel_type_aer_eia923
prime_movers_eia923
energy_source_eia923
natural_gas_transport_eia923
transport_modes_eia923
fuel_receipts_costs_eia923
entities.py
file