TulipaEnergy / TulipaEnergyModel.jl

An energy system optimization model that is flexible, computationally efficient, and academically robust.
Apache License 2.0
27 stars 18 forks source link

Refactor the model #701

Open clizbe opened 3 months ago

clizbe commented 3 months ago

Discussed in https://github.com/TulipaEnergy/TulipaEnergyModel.jl/discussions/688

Originally posted by **datejada** July 1, 2024 ### Overview of the changes - [ ] Use TulipaIO instead of directly accessing the data We need to generalize the data access and move the logic to TulipaIO. A large part depends on DuckDB-specific things (e.g., connection). Maybe we'll need to create a TulipaIO structure? - [ ] Remove the dependency on data stored in the graph In preparation to remove this data from the graph, we can initially just use the data from some other place. - [ ] Remove the data from the graph Don't save data in the graph. After the issue "Remove the dependency on graph data", this will be simple - [ ] #885 We can control and improve both the creation of the data/tables and the create_model function better if these concerns are separated. This includes the "expression" data, e.g., data that informs the incoming and outgoing flow. - This includes clustering stuff (representative periods, timeframe, years, etc.) - Partition should also be created before the model, though we are missing the "scenario specification" data as a separate entity - Sets - [x] #892 - Variables - [x] #884 The way that we control the indexing - to prevent sparsity issues - is to precompute the indexes of the variables. The way that we are doing it right now is to compute tables for these indexes. Assuming that keeps being the case, we have to define what is necessary for these tables and define their names. These tables will probably also save the final result, so they are used outside of the model as well. - If we use tables as DuckDB tables, currently have `1:3` in the column but we have to change to two columns (?) 1 and 3. - Maybe it makes sense to keep these tables (only) as DF, so we keep storing ranges and possibly JuMP variables in them - Create a TulipaVariable struct and store the DuckDB table with the indexes (i.e., each row corresponds to a index), and store also a Vector of JuMP variables - [x] #898 - [x] #901 - [x] #903 - [ ] Create investment variables using the indices in the variable structure - [ ] #923 This involves: - move the construction from `construct_dataframes` to `compute_variables_indices` - change many places that access the variables through `dataframes` to instead unpack `x = model[:x]`, or instead use `variables`. We should try both strategies and see what makes more sense - [ ] Move JuMP variables out of the DF - [ ] #909 Make sure that we differentiate them from the input tables in DuckDB/TulipaIO - Constraints and expressions - [ ] Define (balance) constraints tables The way that the (balance) constraints work is by asset. The table defines the necessary information for each of these assets. In the current implementation, one of the columns of this table is the JuMP expression with the incoming flow (and one for the output, and possibly more for extra stuff). Constructing the incoming and outgoing flows is a huge issue. It was slow and we had to figure out a nice way to do it. Furthermore, if we want to use an external table format, we can't save the expressions in the table. So the format will need to be changed. Whatever we do we must be sure that it doesn't compromise performance. - ConstraintPartition struct - Table: asset, clustering (rp, tf, year), time blocks - Vectors for incoming, outgoing - [ ] #895 Do we need something separate for partitions in general, or the structs above are sufficient? - [ ] Reimplement all "add_expression..." function The add_expression functions receive an asset-based constraint (e.g., balance) and the flow variable, and computes all incoming and outgoing flow for each row of the constraint. The current implementation stores the flow and the resulting expressions in the tables. This will not be possible with a DuckDB table, so we either store the resulting expressions in a vector alongside the constraints, or in a separate structure, or don't pre-computed the resulting expressions. Given that we want to precompute as much as possible separately from the model creation, computing the indexes required for the incoming and outgoing expressions might be desired, at first. - Renaming/Style - [ ] Find better names for functions that add, construct, create, compute, etc. - [ ] Instead of passing variables and expressions, etc., unpack them from the model (`x = model[:x]`) or use `variables[:x]`, `expressions[:x]`? Work on #898 should help decide what we'll do. See https://jump.dev/JuMP.jl/stable/tutorials/getting_started/design_patterns_for_larger_models/#Generalize-constraints-and-objectives This will make the code cleaner in many places because it simplifies the function arguments, and the required variables and explicitly unpacked. Some of the code will still depend on the indexes of the variables, and these might still need to be passed. Checklist of how we want things to look - [ ] Separation of concerns (data - structures - model - solution) - [ ] Encapsulation/wrappers to improve readability and argument passing - [ ] Easier identification of what things are (e.g., variable vs sets vs expression) from context or structures - [ ] Uniform naming and some written coding style guidelines (How and where to time, naming of functions, function argument order, etc.) Pipeline draft: 1. Read the data into TulipaIO TulipaData structure (includes connection) (read_csv, etc.) 2. Create all pre-model structures (graph, clustering stuff (rp and tf), partition stuff) 3. Create model data and structures 3.1 Create variables 3.2 Create constraint partitions 4. Create model --- ### Input data names * Double-check the is_seasonal name, since it is used by more assets than the storage (use_timeframe?) * #622 - [ ] #774 ### `create_input_dataframes` - [x] #702 - [x] #705 * Verify if we can use only one df (without using a dict) for assets_profiles and assets_timeframe_profiles (better after having a pipeline with timeframe); have a pipeline example before changing that. * table_tree stores all the data in dfs ### ` create_internal_structures` - [x] #706 - [x] Use DuckDB directly to calculate the weight as sum of the mapping in the RepresentativePeriod. See, PR #708 - [ ] Clean the graph such that we only store the data that are not created before to avoid duplication, e.g., only leave names for flows/edges. - Some of the operations here are inefficient, but they will be removed (e.g., groupby the profiles to store in the graph) - [ ] Have an internal table instead of an internal dictionary for the rep_periods_partitions. This allows more DuckDB integration. Example: ```plaintext graph["Midgard"].rep_periods_partitions = Dict( 1 => [1:3, 4:5, 6:12], 2 => [1:1, 2:2, ... 6:6], ) asset | rep_period | k | begin | end "Mid" | 1 | 1 | 1 | 3 "Mid" | 1 | 2 | 4 | 5 "Mid" | 1 | 3 | 6 | 12 ``` - [ ] Double-check that where we need rep_periods profiles, maybe just call DuckDB, instead of creating all of them apriori. - [ ] #713 - [ ] #643 - [ ] #717 - [ ] #738 ### `compute_assets_partitions` - [ ] Update `compute_assets_partitions!` to use DuckDB more efficiently - To compute the partitions of timeframe. It should be possible to use a join or similar ### `compute_constraints_partitions` - [ ] #642 ### `compute_rp_partitions` * This function needs a full refactor when changing to table (DuckDB) * Big change ### `solve_model` and `solve_model!` - [ ] #115 - Store the solution into tables (a.k.a. DuckDB) instead of the structure solution and the graph. - Clean the solution structure to (maybe?) don't store any vector/array (only keep objective function and some extra info). Even delete it? - Create a new table for the investments (assets, and flows) - #637 ### `create_model` - [ ] Change to efficiency in the asset (see #596) - [ ] #611 * comment -> if we do change all the partitioning to dataframe including the constraints partition computation, then `construct_dataframes` is not necessary. - [ ] constraints_partitions from Dict to df * When using tables the filters will be on the tables not in the graph - [ ] #723 - Change sets and names to be verbose (e.g., Ai -> assets_to_invest) - [ ] Double check the constraint dataframes AffExpr are correctly created with the aggregation in the model. Is unique working as intended? * `add_expression_terms_intra_rp_contraints` will need a refactor, and it will determine changes we need before and after * Notes on code structure: * constraints partition: lowest/highest -> to get the correct partition; * sum of workspace expressions: sum/unique -> to get the correct expression * `profile_aggregation` functions are in the table in the documentation - [ ] There is no need to use Iterators.flatten. Use `for ... for ...` instead. - [ ] Double-check getting rid of the lookups (even if the codecov is not 100%) in pro of the readability of the code. See discussion in #804 - [ ] Double-check the order of the indices in all the elements, constraints, variables, and expressions. They should follow the same order to be more efficient (at least in GAMS and AIMMS is like that) - [ ] Revisit comment in https://github.com/TulipaEnergy/TulipaEnergyModel.jl/pull/814#pullrequestreview-2318390219 - [ ] Revisti comment in https://github.com/TulipaEnergy/TulipaEnergyModel.jl/pull/823#pullrequestreview-2327451589 - [ ] Revisit comment in https://github.com/TulipaEnergy/TulipaEnergyModel.jl/pull/741#discussion_r1726850141 AND THEN: - [ ] Rename input data to make more sense - [ ] How to include Multi-Year? - [x] Separate the model data from the scenario data. The scenario data for multi-year is currently hard-coded in create_model!, this needs to be generalized and relocated (see #803). - [ ] Performance improvements related to multi-year (see #462) Maybe also: - #622
clizbe commented 3 months ago

solve_model and solve_model!

  • [ ] Store the solution into tables (a.k.a. DuckDB) instead of the structure solution and the graph.

@abelsiqueira @datejada Can I try tackling this one? Or should I wait?

abelsiqueira commented 3 months ago

@clizbe you can do it, it's mostly independent from the rest.