calliope-project / calliope

A multi-scale energy systems modelling framework
https://www.callio.pe
Apache License 2.0
277 stars 89 forks source link

Add method to load data sources into the model. #532

Closed brynpickering closed 5 months ago

brynpickering commented 6 months ago

Fixes #92

Summary of changes in this pull request:

This implementation has required some code in calliope.preprocess.model_data to align data loaded from file and those from YAML. This would ideally be even cleaner than it is, but it works for now. The approach I'm taking is:

  1. Load all data from file into one dataset
  2. Create a dummy dict from this with empty tech definitions and empty node definitions with relevant techs attached to those nodes (based on what data is defined from file). Both of these are applied to the user-defined YAML to not mess up the YAML->dataset code.
  3. Create another dummy dict with base tech data (base_tech and carrier_in/out - if included in data from file) which goes at the bottom of the tech inheritance chain. I didn't put this in (2) as I don't want it to override some user YAML definition (e.g., carrier_in is changed by a YAML override compared to what is loaded from file).

## REMAINING ISSUES ~- Currently, It is very difficult to ensure any amount of YAML definition can be handled. If there is minimal info provided in YAML (e.g., one specific parameter override for one tech) then you have no info available about which techs exist at which nodes except for what you have provided in data_sources. For the national scale example I've set up, inferring which techs are defined at which nodes gets messed up by array broadcasting of flow_cap_max, making the model think that all techs are defined at all nodes.~ EDIT: I think this is fixed. ~~- We probably don't want the national scale example data duplicated in CSV in the calliope module itself. Perhaps we move this to tests? ~~

TODO

- [ ] Order of overrides (YAML > data sources or data sources > YAML) and exception behaviour on clashes between data sources and between YAML and data sources (both currently set to silently override) should be configurable. EDIT: leaving "YAML > data sources" order as-is an non-configurable.

Reviewer checklist:

codecov[bot] commented 6 months ago

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (33b0672) 95.19% compared to head (e4a7137) 95.65%.

:exclamation: Current head e4a7137 differs from pull request most recent head 15f102c. Consider uploading reports for the commit 15f102c to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #532 +/- ## ========================================== + Coverage 95.19% 95.65% +0.46% ========================================== Files 24 25 +1 Lines 3306 3450 +144 Branches 706 683 -23 ========================================== + Hits 3147 3300 +153 + Misses 92 85 -7 + Partials 67 65 -2 ``` | [Files](https://app.codecov.io/gh/calliope-project/calliope/pull/532?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=calliope-project) | Coverage Δ | | |---|---|---| | [src/calliope/attrdict.py](https://app.codecov.io/gh/calliope-project/calliope/pull/532?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=calliope-project#diff-c3JjL2NhbGxpb3BlL2F0dHJkaWN0LnB5) | `96.48% <100.00%> (ø)` | | | [src/calliope/backend/backend\_model.py](https://app.codecov.io/gh/calliope-project/calliope/pull/532?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=calliope-project#diff-c3JjL2NhbGxpb3BlL2JhY2tlbmQvYmFja2VuZF9tb2RlbC5weQ==) | `97.67% <100.00%> (+0.01%)` | :arrow_up: | | [src/calliope/core/io.py](https://app.codecov.io/gh/calliope-project/calliope/pull/532?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=calliope-project#diff-c3JjL2NhbGxpb3BlL2NvcmUvaW8ucHk=) | `94.79% <100.00%> (+0.34%)` | :arrow_up: | | [src/calliope/core/model.py](https://app.codecov.io/gh/calliope-project/calliope/pull/532?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=calliope-project#diff-c3JjL2NhbGxpb3BlL2NvcmUvbW9kZWwucHk=) | `94.76% <100.00%> (+0.05%)` | :arrow_up: | | [src/calliope/examples.py](https://app.codecov.io/gh/calliope-project/calliope/pull/532?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=calliope-project#diff-c3JjL2NhbGxpb3BlL2V4YW1wbGVzLnB5) | `100.00% <100.00%> (ø)` | | | [src/calliope/postprocess/postprocess.py](https://app.codecov.io/gh/calliope-project/calliope/pull/532?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=calliope-project#diff-c3JjL2NhbGxpb3BlL3Bvc3Rwcm9jZXNzL3Bvc3Rwcm9jZXNzLnB5) | `90.32% <100.00%> (ø)` | | | [src/calliope/preprocess/data\_sources.py](https://app.codecov.io/gh/calliope-project/calliope/pull/532?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=calliope-project#diff-c3JjL2NhbGxpb3BlL3ByZXByb2Nlc3MvZGF0YV9zb3VyY2VzLnB5) | `100.00% <100.00%> (ø)` | | | [src/calliope/util/schema.py](https://app.codecov.io/gh/calliope-project/calliope/pull/532?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=calliope-project#diff-c3JjL2NhbGxpb3BlL3V0aWwvc2NoZW1hLnB5) | `90.32% <100.00%> (ø)` | | | [src/calliope/preprocess/model\_data.py](https://app.codecov.io/gh/calliope-project/calliope/pull/532?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=calliope-project#diff-c3JjL2NhbGxpb3BlL3ByZXByb2Nlc3MvbW9kZWxfZGF0YS5weQ==) | `99.34% <97.77%> (-0.66%)` | :arrow_down: | | [src/calliope/preprocess/time.py](https://app.codecov.io/gh/calliope-project/calliope/pull/532?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=calliope-project#diff-c3JjL2NhbGxpb3BlL3ByZXByb2Nlc3MvdGltZS5weQ==) | `95.78% <95.23%> (+7.21%)` | :arrow_up: | ... and [1 file with indirect coverage changes](https://app.codecov.io/gh/calliope-project/calliope/pull/532/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=calliope-project)
brynpickering commented 5 months ago

@sjpfenninger for carrier_in and carrier_out as well as to and from, perhaps we can let them work in a special way...

carrier_in/_out:

The user could provide it without the carriers dimension as they do in YAML. E.g.:

techs carrier_in carrier_out
supply_tech foo
supply_tech bar
demand_tech baz
conversion_tech [foo, bar] baz

We would then enforce that when loading the data and process it back into a dictionary to merge into the traditional model definition.

Multiple carriers in a list would need special parsing as they would likely be loaded in as strings ("[foo, bar]") and would need processing back to lists of strings.

to/from

As with carrier_in/_out, we let users define it as they would in YAML. However, there is the added step that we would need to identify transmission technologies once all data files are loaded and a dummy "traditional" model definition has been created. Then we would go back to the loaded data files and make a check that no parameters were defined over the nodes dimension for transmission technologies. This would then emulate the YAML loading checks, which do not allow a transmission technology to be defined at a node.

The loop back to do the nodes check could be a pain, but manageable I think.

pros/cons

pros

cons

brynpickering commented 5 months ago

@sjpfenninger the solution I opted for was to allow to/from to be defined in text format in file but to limit carrier_in/out to still be boolean and to raise an error if a transmission technology defines data at nodes in the loaded data from file. This seems to me like a reasonable compromise that stops us having a separate method to load data from file (as in, YAML-esque data that needs to be processed separately to a dictionary).

brynpickering commented 5 months ago

Docs added in #538