How to handle metrics classification inconsistencies at time of ingestion

Our system works by normalizing metrics into unique combinations of category, usage, and source. We look to the metrics_metadata table to determine which fields it the metrics table can be used to represent a given category, usage, and source datapoint for a given scenario.

For example, if we are interested in the total cost for a study, that would be represented by Category=Cost, Usage=, Source= (where a blank value implies all). To determine which field we would render for this data on the baseline scenario, we can look to our metrics_metadata table and find the row that contains a blank scenario column, a category column with a value of Cost, and a blank usage and source column. To find out how any given scenario would affect that value, we would look for a row with a scenario column matching the scenario of interest, a category column with a value of Cost, and a blank usage and source column.

However, we are noticing some data inconsistencies when reviewing Municipal Data v4:

There are no row for the baseline scenario where category=Cost, usage=, source=.
There are multiple for each scenario where category=Cost, usage=, source=

This brings up the following question:

How to handle situations where a scenario provides a category, usage, source combination that doesn't exist in the baseline scenario? Possible solutions: ignore, throw an error
How to handle situations where there are multiple rows within a given scenario with competing combinations of category, usage, and source? Possible solutions: accept first, throw an error

developmentseed / tecnico-energy-app

How to handle metrics classification inconsistencies at time of ingestion #27