Build the input workflow

clizbe commented 10 months ago

Build the basic input workflow from raw data to the model.

See discussion #288

Considerations

[x] TulipaEnergy/TulipaIO.jl#1
[x] What other data sources need to be supported?
[x] How to merge data from different sources?
[ ] How to handle different units in source data? -> waiting for UnitsJuMP.jl
[x] Do we need a local data store?
- DuckDB? (favoured)
- SQLite?
[ ] TulipaEnergy/TulipaEnergyModel.jl#415
[ ] TulipaEnergy/TulipaEnergyModel.jl#414
[ ] Specify solver specifications
- [x] common options
- [ ] options unique to specific solvers
[x] TulipaEnergy/TulipaEnergyModel.jl#295

Meta considerations
[ ] Do we need parallel execution of pipelines?
[ ] Maybe supporting parallel jobs with shared inputs is sufficient

Capabilities/Usability requirements

[ ] able to visualise and inspect input/intermediate datasets
[ ] Scenario building
- [ ] change (enable/disable/limit) capacities
- [ ] easily specify scenario parameters for multiple scenarios
- need examples
- probably needs some kind of filter-and-apply API
- should be code, not config
[ ] run model unattended on a server/cluster
[ ] ability to compare & inspect multiple runs (e.g. different scenarios)

Related Issues

[x] TulipaEnergy/TulipaEnergyModel.jl#89

WHAT WE WANT Build the network once (in a while) Use draft networks to build new networks Sufficient flexibility for ad-hoc code for experimentation Definition of temporal stuff Definition of scenarios (what is included here?) Scope: just model or parts of pipeline (which parts?) Definition of solver specifications Be able to mix data sources (ESDL + ENTSO-E for example) Self-hosted Tulipa database (in case sources change/vanish, & reduce re-pulling/processing data) Export ESDL to simplified representation that is compatible with Tulipa

abelsiqueira commented 10 months ago

Does this includes the representative periods and the assets and flows partitions, or is it just for the data sources?

suvayu commented 10 months ago

The representative period comes from an algorithm, so that should be included, but optionally. A scenario might not require the algorithm, and use fixed periods instead, or the case where the algorithm has run once, and the input hasn't changed, then it need not run again.

As for the flow partitions, aren't they derivable from the profiles? If so, then that would also be along the lines of "compute if input changes".

clizbe commented 10 months ago

@Lokkij I tagged you on this one too if you're interested. You're of course our source for ESDL knowledge but I thought you might also be interested in this stuff. :)

suvayu commented 10 months ago

Is it possible to filter out attributes not used in Tulipa when exporting ESDL to JSON?

I thought we decided against this because that would be a choice on the Tulipa side and not the ESDL side?

@clizbe I'm guessing you left that comment? Best to discuss in the thread instead of the editing top post.

I see that my wording is pretty unclear. AFAICT, there are two levels of filtering; the top-level includes stuff that are not in Tulipa because of fundamental modelling choices, e.g. no connections. So maybe then having the Port attributes in ESDL will never make sense. And the next level is any other finer choices that we make, which evolves with time.

In this case I mean the top-level fundamental choices. But maybe I'm over thinking it, and doing everything in one go is simpler.

clizbe commented 9 months ago

Yes I think some of it will be specifying the type of ESDL file that Tulipa accepts - which variables should be filled, etc. And then probably a step of converting that ESDL into the form that Tulipa likes, which will include throwing out anything else and maybe some conversion trickery. I would prefer if the ESDL file looks normal before conversion and that we don't build really weird ESDLs - but we'll see what works.

Lokkij commented 9 months ago

Is it possible to filter out attributes not used in Tulipa when exporting ESDL to JSON?

I thought we decided against this because that would be a choice on the Tulipa side and not the ESDL side?

Usually the approach here is to leave attributes in ESDL and simply not read them from the model if you don't need them. In our case, I would keep the filtering as close to Tulipa as possible. That will likely make it easier to write back results to ESDL while keeping the original attributes intact.

Do we need a local data store?

What would the local data store be used for? To store temporary in-between data, or something else?

suvayu commented 9 months ago

On Tue, 28 Nov, 2023, 10:49 Wester Coenraads, @.***> wrote:

Do we need a local data store?

What would the local data store be used for? To store temporary in-between data, or something else?

As my understanding goes, for larger datasets we will have to connect to influxdb (or similar) and download for Tulipa to read. There will also be intermediate steps (e.g different ways to compute representative days) etc. I doubt we want to download the dataset every time, or recompute unchanged steps every time.

-- Suvayu

clizbe commented 9 months ago

Just saw this at a Spine meeting and thought it would be super handy to have something similar! (Maybe you had this in mind already, but it's new to me.) From what I understand it shows where specific data is coming from and the lines sort of indicate how it's processed?

Lokkij commented 9 months ago

As my understanding goes, for larger datasets we will have to connect to influxdb (or similar) and download for Tulipa to read. There will also be intermediate steps (e.g different ways to compute representative days) etc. I doubt we want to download the dataset every time, or recompute unchanged steps every time.

Ah, so a sort of local DB to store data while doing other operations? I wouldn't expect our data to be so big as to need it, honestly - you can fit a lot of profiles in a few GBs of RAM. But maybe I'm missing something?

Just saw this at a Spine meeting and thought it would be super handy to have something similar! (Maybe you had this in mind already, but it's new to me.) From what I understand it shows where specific data is coming from and the lines sort of indicate how it's processed?

To me this looks like a class diagram, very similar to the diagrams for ESDL. The ESDL documentation has diagrams for all classes, for example: https://energytransition.github.io/#router/doc-content/687474703a2f2f7777772e746e6f2e6e6c2f6573646c/PowerPlant.html

clizbe commented 9 months ago

@datejada @gnawin @clizbe Add some use-cases of how you're going to use the model and what your workflow is so they have a better idea of what we need. "I want to run the model from the train" is valid. :)

clizbe commented 9 months ago

Use Cases I would like to be able to:

summarize/visualize my input data (in tables or graphs), such as total wind capacity, transport line capacities, available technologies.
make transport capacities in certain areas unlimited, while still constraining others.
set up multiple scenarios to run in parallel or (otherwise) series - set and forget.
visualize output data from one scenario, as well as compare multiple scenarios.
keep track of what model version and what data was used for a particular run/analysis - reproducibility.
easily specify scenario parameters for multiple scenarios.
occasionally add new data / data sources.
specify which data sources to use to build a scenario.
run the model somewhere that I can go about other work while it runs.
know when the model is finished running.

My current workflow for running scenarios is:

Duplicate a "default" Access dataset - this has everything needed to do a run.
In Excel, process scenario-unique (new) data, so it works with the model.
In Access, filter for and delete any data that will be replaced by the new data.
Copy and paste the new data into the dataset.
Go into the model, Browse for the dataset, Load it, Run the model.
Check frequently if the model has finished running.
Export data to Excel to make graphs (although Wester is building a UI to make this nicer).

Pros/Cons of Access

Can easily see data (once you know where it is)
Easy to learn how to edit
Takes a long time to edit
Sometimes you don't know where the data is
Huge tables make it slow even loading/filtering

suvayu commented 9 months ago

Ah, so a sort of local DB to store data while doing other operations? I wouldn't expect our data to be so big as to need it, honestly - you can fit a lot of profiles in a few GBs of RAM. But maybe I'm missing something?

I guess that's pretty small. However I would really like to support a workflow that doesn't necessitate you to be online. But if people say there's no such need, we can drop it.

Edit: more I think about it, I think we need it, e.g. for running different scenarios it makes no sense to download the same data repeatedly even if it is small. So the question is, should the local store also be accessible to normal users for inspection and analysis. And based on @clizbe's points, I think it should be.

suvayu commented 9 months ago

Pros/Cons of Access

Can easily see data (once you know where it is)

Easy to learn how to edit

Takes a long time to edit

Sometimes you don't know where the data is

Huge tables make it slow even loading/filtering

@clizbe Do you know SQL? Is it fair to expect someone who is doing analysis to know/learn a bit of SQL?

clizbe commented 8 months ago

@suvayu Sorry I don't know if I responded in person. Learning SQL is totally feasible. I don't think our current modellers know it. (I've used it once.)

clizbe commented 7 months ago

Compiling the model takes a lot of time (Julia thing) with future runs going faster. How are we dealing with this in the workflow? Is the stable version of Tulipa something that compiles once and then can take any data through it? Or will the scenario define a model that needs precompiling before doing multiple runs?

suvayu commented 7 months ago

I think this request needs to be separated according to use case. For example, if you changed an input dataset, naively, you have to rerun. However if you say "I'm doing a sensitivity study, and my changes are only limited to X" then theoretically the repetitions need not start from scratch. But I think that's a very advanced feature which requires deep technical research. AFAIU, this is in @g-moralesespana and @datejada's wishlist (GUSS in GAMS). But there could be simpler use cases between these two extremes.

That said, I'm not sure whether this would fall under the purview or pipeline/workflow or model building. My hunch is, it'll depend on the use case.

I hope that makes sense :⁠-⁠P

clizbe commented 7 months ago

Yeah I figured I'd comment here in case it's a simple answer, but it's probably a bigger discussion.

This is becoming an issue with Spine, so it's good to think about it early.

datejada commented 4 months ago

For the ENTSOE data base I found this, but I'm not sure if we have access (or if we could have)...it might be interesting to explore it...

https://www.linkedin.com/posts/activity-7140005469414133760-f4XH/?utm_source=share&utm_medium=member_desktop

datejada commented 4 months ago

@nope82 commented the following about ENTSO-E:

From just a quick check it seems that this PEMMDB is only accessed by TSOs (Author’s comment: “Sadly no, (data transparency) it is only for sharing between TSO members”. When looking for access to the data, only found a reglament from the EERA study from ACER asking for the PEMMDB data :

“On 23 November 2021, ACER requested ENTSO-E to provide all input data for the ERAA 2021. On 2 December 2021, ENTSO-E provided ACER with access to the pan-European market modelling database (PEMMDB) and the assumptions for the economic viability assessment (EVA)”.

So it seems that ENTSO-E would be the only one that could give access to it, and also seems to be one-time thing access for specific data (or need to do recurrent request access) instead of a completely open access to the data probably

clizbe commented 1 month ago

@clizbe Reorganize the info here and close this issue

TulipaEnergy / TulipaEnergyModel.jl

Build the input workflow #289

Considerations

Meta considerations

Capabilities/Usability requirements