Open irm-codebase opened 1 week ago
I think this is a good idea. So the proposal is to allow mappings in this way, right?
rows: {column_name_in_data: dimension_name_in_model}
rather than just
rows: dimension_name_in_model
which requires the dimension name to appear exactly as-is in the data, and should fail in all other cases
Pretty much. If possible, though, I think supporting both is the best case.
Basically, if the type is string, assume its a match. If it's a dict, assume its a mapping?
@irm-codebase I'd argue that those different columns actually introduce some ambiguity. How is one to know that the ones not suffixed with _utc
are in UTC timezone....? Generally, it makes a lot of sense to follow a standard index name format (e.g., there's a relatively strict set used by the climate community). It's so much easier to maintain tabular data if you follow a standard format.
Still, I'm willing to introduce this mapping for these edge cases.
Having different data types for a single config entry is always a pain to maintain and to document. I would prefer to have something like a mapping
key, e.g.:
data_sources:
demand_elec_timeseries:
source: timeseries/demand/electricity.csv
columns: nodes
rows: timesteps
map_dims:
timesteps: time
add_dims:
techs: demand_elec
parameters: sink_use_equals
I agree @brynpickering
Our default should be to assume the data is provided in the correct format. But adding a bit of extra flexibility should avoid extra code in certain cases.
I think your mapping approach is also better than my proposal, so no issues there!
What can be improved?
Being able to load tabular data is a wonderful new feature. But it is currently limited by forcing the data to 'match' calliope's dimension names in certain cases. Enforcing strict naming by default makes a lot of sense: you avoid ambiguity and you also avoid the risks of relying on column position.
However, it will often be too inflexible.
Reasoning
Some of our names might lead to files being less human readable, or finicky:
nodes
is less informative thancountry
technology
ortech
instead oftechs
, since it can be intuitive to name columns in singulartime
orutc_timestamp
, overtimesteps
...This will lead to a lot of 'boilerplate' code that just shapes the data to fit Calliope's naming. See the following 3 examples for different names used for timeseries in Euro Calliope with v6.10:
All 3 are equally 'human' readable, but since they do not specify
timesteps
, they won't load into Calliope.Proposal
An option to use mappings would solve this issue. For example, you could load one of the timeseries above this way:
This is still strict, but more flexible.
Version
v0.7.0.dev3