Open danielhuppmann opened 6 days ago
Maybe this was already implemented in #397, please double-check.
Indeed I was checking that today with some tests, will confirm tomorrow
Follow-up because I did some tests myself: wildcard * in variable names work, but the units are not checked. Plus there may be some difficulties here because there may be multiple possibly matching VariableCode items for a variable, e.g.,
- Capital Cost|Hydrogen|*:
description: ...
unit: USD_2010/kW
- Capital Cost|Hydrogen|Fossil*:
description: ...
unit: EUR_2020/kW
Not saying that this makes sense, but if there is now a variable "Capital Cost|Hydrogen|Fossil|Coal" in an IamDataFrame, it's not clear which unit should apply...
@danielhuppmann, thanks for the checking. I also looked at the code in detail now and I think there's a couple of different ways we could go about the issue of unit ambiguity that you mentioned.
In the interest of keeping patterns as simple as possible and avoid ambiguity as much as possible I'd suggestion option 3.
3 is a nice idea, but probably takes a bit more time to implement.
So I suggest to implement a simple "if the variable to be validated matches the wildcard-codelist, the unit must match" (which might cause issues in corner cases but probably not that relevant in practice anyway).
Then add a sanity-check to be called during validate-project
that wildcard-codes use not have well-defined duplicates.
Then add a sanity-check to be called during validate-project that wildcard-codes use not have well-defined duplicates.
I have read your suggestion a couple of times now but I fail to understand how that is different to my point 3. What I was describing as this additional check is what I believe you are calling "sanity-check". I'd implement it as a (surprise, surprise) pydantic validator for VariableCodeList
. You'd check if any wildcard variable matches any other wildcard variable.
Sorry for not being clear. Parsing a DataStructureDefinition for large projects is already taking quite some time, so adding yet another pydantic-validator (executed every time) might not be the smartest move.
Hence my suggestion to implement that as a validation-method that is not executed when initializing the DataStructureDefinition but only as part of the validate-project
CLI (so for example as part of GitHub Actions in a workflow repository).
Parsing a DataStructureDefinition for large projects is already taking quite some time, so adding yet another pydantic-validator (executed every time) might not be the smartest move.
Without having run any benchmarks on that, doesn't reading in data, which we usually do when using nomenclature, typically take order(s) of magnitude longer? Where is the performance of the validators an issues currently? Do you mean in the scenario processing, in the testing of PRs, running locally, ...?
To allow more flexibility for reporting of technical parameters, we want to allow "wildcard-codes" echoing the wildcard-implementation in pyam using
*
.Concept: a VariableCode can be defined as
The validation-method should then accept any variable that matches the code-name including any string for the wildcard.
This can follow the implementation by @phackstock here https://github.com/IAMconsortium/nomenclature/blob/f210213ccf51e8f70e70cd3e1715f273cb100f0c/nomenclature/config.py#L34
To be explicit, any of the following variables should pass validation: