Large MOST models (nt > 1000)

lordleoo commented 4 years ago

Building a multi-period MOST model for a large number of periods (i.e. 8,760 hours = 1 year) at one shot took 7 hours on a high performance computer cluster just to build the model, without solving it If you want to do optimal sizing (long term planning), 7 hours means your optimization might take a year even on a cluster. I used MATLAB's profiler to see where most of the time was spent, and I can see that the main culprets are routines: 1) add_named_set, which itself has different uses, but one of its sub-functions where a lot of time was spent was add_named_set 2) params_lin_constraint.

I went through both functions to try and make some improvements. I couldn't do anything about (1), however, in params_lin_constraint, I could achieve about 20% time savings by changing line #108 from: At = At + Akt_full; to: At(:, i1:iN) = At(:, i1:iN) + Akt_full(:, i1:iN); %this line was added by ****; original was the one above and it was very slow for large At

Other options to circumvent the problem of building a large MOST model are:

1) Build the model once and modify certain parameters when you test different candidate solutions

Disadvantages:
- moving around a variable of size 5GB is going to be slow
Advantages:
- You can be confident about all inter-temporal coupling and inter-scenario coupling being done right

2) Build several sub-models (slices) is much faster. for example, building 52 small models, with each model = 1 week, takes 3 minutes after that, you can connect these models together. In this case, i'd advise making the sub-models NOT mutually exclusive. instead, having 1 overlapping period between each 2 consecutive models helps: You can force a constraint: Pg(last time period of slice i) = Pg(first time period in slice i+1) This way, the ramping constraint between each J(t) scenario and each scenario J(t+1) would be built properly. However, stored-energy level still needs to be corrected manually.

Disadvantages:
- In MOST, the constraint on storedEnergyLevel in storage devices is written like: for every time period (t): storage_min <= sum(SoC(1:t)) <= storage_max so you need to adjust this constraint for all later slices i>1
- consolidating the variables in all slices will make adjusting the StoredEnergyLevel constraint a delicate matter you can circumvent this obstacle by requiring the terminal-storage at each subslice to be at a certain level; but that changes the solution.
- if you mess up the order of variables in OM, the QP.A, QP.l and QP.u matrices might still be correct and you would still get a correct optimal solution. However, most_summary(om) would read the results wrong
Advantages:
- Building time is very short; if coupling slices is done right, building small slices and connecting them is going to be very fast

3) Build several sub-models (slices) is much faster. Build slice (1) and solve it, if it is feasible proceed to build slice (2), if not, terminate with penalty proportional to slice number

Disadvantages:
- Serious disadvantage is: the solver may act wasteful or relaxed in earlier slices. finding a feasible solution for later hours may require oversizing the storage this isn't a concern if you have very few components and a simple system. for a complex system, with transmission losses, and different storage devices with different characters. The obtained solution is unlikely to be optimal. you can circumvent this obstacle by requiring the terminal-storage at the end of each slice to be a certain level; but that changes the solution.
- You still have to manually correct the constraint StoredEnergyLevel
Advantages:
- saves time on building later slices if early slices dont work
  
  i think i will proceed with option 2. you can do option 2, save the model and adjust it instead of rebuilding it everytime

Constraitns that involve/incorporate time coupling accross periods (not accross wind-scenarios or between k=0 and contingency(k)) are: 'Rrp' 'Rrm' 'mindown' 'minup' 'uvw'

rdzman commented 4 years ago

One quick thing, right off the bat ... MATPOWER/matpower#70 addresses params_lin_constraint() slowness on these large MOST models. I have a case where the time for params_lin_constraint() goes from almost 7 minutes down to about 5 seconds. I'm working on finalizing the logic for when to use the new method, since it is slower on certain cases, and much faster on others.

lordleoo commented 4 years ago

I apologize if this issue was known and solved. Thanks for your quick answer though. I actually am running MATPOWER7.0b1;

Is this new feature exclusive to to MATPOWER7 (not in 7.0b1)? I know there is a newer MATPOWER package but I already made many changes on my package and wrote comments to myself inside many files.

rdzman commented 4 years ago

It is being worked in pull request MATPOWER/matpower#70 for inclusion into the master branch, which I hope to complete very soon. So, it's not yet in the master branch, let alone in any numbered release of MATPOWER.

rdzman commented 4 years ago

I confess I question whether attempting to solve a single optimization with 8760 hours is what you really want to do even if it were computationally feasible. There are good reasons, besides the obvious computational ones, for not doing unit commitment with an hourly 1 year horizon (e.g. data). Then again, there are often good reasons to use a tool in ways the creator of the tool never envisioned.

May I ask what is the context of this 8760 hour problem you are trying to solve with MOST?

lordleoo commented 4 years ago

I wasn't doing UC. I am testing a combination of renewables & storage devices (commitment is not an issue here); if the capacity (sizing) of this combination can support a system or not.

This inspired me to modify MOST itself to optimize the sizes/capacity of system devices; and avoid a two-level optimization problem. I looked at most.m code and I assume it is doable (but needs time which i dont have). 1) introduce a new variable: device_capacity 2) change the xmax on all Pg variables to INF 3) add a new constraint: Pg < device_capacity 4) add a cost parameter for device_capacity

I'd call it: MOPT: matpower optimal planning tool :smile: This could be a feature-request issue

lordleoo commented 4 years ago

It is being worked in pull request MATPOWER/matpower#70 for inclusion into the master branch, which I hope to complete very soon. So, it's not yet in the master branch, let alone in any numbered release of MATPOWER.

May I ask how soon? thanks in advance

lordleoo commented 4 years ago

One quick thing, right off the bat ... MATPOWER/matpower#70 addresses params_lin_constraint() slowness on these large MOST models. I have a case where the time for params_lin_constraint() goes from almost 7 minutes down to about 5 seconds. I'm working on finalizing the logic for when to use the new method, since it is slower on certain cases, and much faster on others.

I read the issue. That looks great; but the issue doesn't touch upon other slow functions: 1) opt_model.add_named_set line 87: om_ff.order(om_ff.NS).name = name; line 218: om.(ff) = om_ff More time is spent here than param_lin_constraint

2) opt_model.add_var line 88: om.add_named_set('var', name, idx, N, v0, vl, vu, vt); An equal amount of time is spent there, as: params_lin_constraint.

The cause of the issue seems to be the same, an expanding sparse matrix.

rdzman commented 4 years ago

It does look like the bottleneck now for building these large models is in add_named_set(). But it is not the same as the one we are addressing in params_lin_constraint().

I actually found an amazing single fix for both issues, believe it or not (must have been some divine inspiration, because I don't know how it even occurred to me to try something like this).

Simply add ...

    om.(ff) = [];

... right after line 46 in add_named_set().

On a big model of mine it cut the time in add_named_set() from about 123 secs down to 24 secs.

This is the kind of stuff you aren't supposed to have to think about when programming in a high-level language like Matlab. 😜 If I'm guessing correctly, after executing line 46, there are two references to one big struct. When we go messing with the contents of that struct (in 87), it creates a new copy (to keep the original unchanged), allocating new memory, etc., but then we replace the original with the copy in 218, and it frees up the memory for the original. By removing the extra reference to the original (with my proposed added line) before modifying it, it removes the need to allocate and free memory for the extra (unused) copy.

I'll try to get both of these fixes into MATPOWER as soon as I can, if I don't finish it today, it'll probably be late next week.

Thanks for bringing this to my attention.

rdzman commented 4 years ago

These issues have now been fixed. See MATPOWER/matpower#70 and MATPOWER/matpower#79.

MATPOWER / most

Large MOST models (nt > 1000) #7

saves time on building later slices if early slices dont work