Memory consumption and computational time

arnaud-leroy commented 7 years ago

Hello,

I am facing some issues related to the size of my model and I would like to know how to reduce the memory consumption and computational time (I am not using the new MILP features).

The server used for the optimization has 16 cores and 64Gb of memory.

I am trying to represent a one year hourly energy model of the south tyrol region (electricity and heating) and would like in the future to do a bigger model (+/- 30 regions) in the future. However my tests are showing that even using 2 locations [one using 19 techs and the 4 others 3-4 techs (one demand, one unmet demand, one or two supply)] is already too much for the server we are using (lack of memory). 2 locations with one having 27 techs and the other 4 techs is needing +/- 1 hour and a half of computational time.

I will try to add benchmarking tests to this post during the day.

arnaud-leroy commented 7 years ago

I have tried a first test today, using 1 to 5 locations and 1 to 5 supply technologies with the same carrier. A demand and unmet demand were also present in each location.

It seems that the computational time is proportional to the number of different technologies and to the number of locations.

For example: location1 is composed of the supply technologies sup1, sup2, sup3, sup4 and sup5 (all of them having the same characteristics), The computational time is the same if location2 is using only sup1 or is using the 5 supply technologies used in the first location

I can't say if it is normal or not, but I would have expected a proportionality with the total sum of technologies in each location, meaning that the computational time would show differences between location2 using only sup1 or the 5 supply technologies of location1.

brynpickering commented 7 years ago

This is primarily the fault of Pyomo, although we could be taking measures in Calliope to partially mitigate it...

There are three possible time barriers:

Pyomo creates constraints by looping over everything and creating a new object each time. So a constraint which applies to all technologies at all locations in all timesteps involves a lot of loops and a lot of generated objects! It also involves high RAM utilisation as Pyomo keeps all those objects to hand. For the scale of problems you are talking about, this is in the region of hundreds of millions of constraints to be generated. I have breached 64GB in this part alone.
Pyomo spends a lot of time converting its structure into an LP file, to be read in by whichever solver you're choosing to use (I've had some LP files take hours to generate). This also doubles RAM usage while the LP file is being generated, although it should then half again following LP file generation.
Computational optimisation just takes time, especially so if there are binary/integer variables and/or you're not using a commercial solver. Even if you overcome these barriers, a problem with hundreds of millions of variables just needs to be sat on a high performance computer and left for a few days. In terms of memory usage, I know CPLEX will use some RAM (~a few GB) at any given time, but when it has finished a particular branch it will dump that to the hard drive, so you don't keep increasing your RAM usage.

The first two points are ones we've discussed quite a lot and hope to have a clear plan for mitigating in the next month or so (i.e. moving away from relying on pyomo). The third is difficult to avoid - you have to pay the price for complex problems. We have certainly reduced the number of decision variables required to be solved, e.g. in creating the conversion_plus technology type, but there is more to be done there. The problem you mention with regards to having a time penalty even when technologies are not set at a given location feeds into all three of the points. With a commercial solver, the optimisation time should be proportional to the total sum of technologies at each location as it is good at throwing away unnecessary decision variables (i.e. those set to == 0), I'm not sure that GLPK is as good at doing that. To avoid the time penalty in Pyomo requires some clever work on how you define your sets. It's something we'd love to do, but just haven't had time to yet.

For your server, the high number of cores won't be of much use unless you are doing parallel runs!

brynpickering commented 7 years ago

Update on this. I've created a new branch: github.com/calliope-project/calliope/tree/merge_loc_tech_sets

Take a look to see if it helps to reduce the size of your problem for optimisation. If it seems to be useful then I'll look to merge it into Calliope master

brynpickering commented 7 years ago

On a problem which is large in the spatial and technology dimensions (c: 5, t: 24, techs: 25, x: 93, y: 198) The new branch reduces the problem size from 4925479 variables to 296828 (16x smaller). This reduces Pyomo preprocessing time (~80s vs 530s) and the size of the Pyomo LP file (~80MB vs 1.36GB). However, CPLEX reduces both problems to a similar sized MIP problem:

0.5.3: Reduced MIP has 15926 rows, 11888 columns, and 42908 nonzeros. 0.5.3 w/loc_tech sets: Reduced MIP has 16122 rows, 12064 columns, and 43516 nonzeros.

Optimising takes longer without the new sets, but this is likely a function of stress on my machine at the time of running each one:

0.5.3: 19.88 sec 0.5.3 w/loc_tech sets: 4.36 sec.

On the face of it, this would be a quick way to reduce preprocessing and post-processing time, although optimisation time will likely be of a similar order of magnitude. It highlights Pyomo's inefficient problem generation processes, as well as Calliope's!

arnaud-leroy commented 7 years ago

Hi Bryn,

I tried to install the new branch in Anaconda but I couldn't succeed I guess (the preprocessing time isn't changing between the master and the branch).

Do you have a step by step way to use the new branch on Anaconda? Thanks, Arnaud

EDIT: I manage to do it that way:

clone the repository
create and activate a conda environment
switch the git repository from master to the branch: git checkout branchname
install with: python setup.py develop

brynpickering commented 7 years ago

I see your edit! If you have a local instance of calliope available to you, associated with a particular anaconda environment, then you should be able to just switch your branch using a git command and it will switch the currently activate calliope branch in your environment. That way, one environment is enough. Keep in mind that if you use python interactively (e.g. Jupyter) then you'll need to restart the kernel every time you change your active branch, in order to import the correct version of calliope

arnaud-leroy commented 7 years ago

Thanks :)

After a first group of tests I can confirm that this branch decreases the preprocessing time and solving time.

For example: 5 tocations with 2 technologies + 1 location with 7 technologies (the same 2+5 others)

Results: Master: 531s: 146s preprocessing and 355s solving model New Branch: 149s, 62s preprocessing and 71s solving model

Great Job 👍

arnaud-leroy commented 7 years ago

Hello, if the merge_loc_tech_sets branch is working on my computer, I am facing some troubles to install it on a Linux server.

I used checkout to be on the branch merge_loc_tech_sets and then I tried to install it : python setup.py develop which gave the following output:

['config/*.yaml', 'test/common/*.yaml', 'example_models/urban_scale/*.csv', 'example_models/urban_scale/*.yaml', 'example_models/urban_scale/*.rst', 'example_models/national_scale/*.csv', 'example_models/national_scale/*.yaml', 'example_models/national_scale/*.rst', 'example_models/urban_scale/model_config/*.csv', 'example_models/urban_scale/model_config/*.yaml', 'example_models/urban_scale/model_config/*.rst', 'example_models/national_scale/model_config/*.csv', 'example_models/national_scale/model_config/*.yaml', 'example_models/national_scale/model_config/*.rst', 'example_models/urban_scale/model_config/data/*.csv', 'example_models/urban_scale/model_config/data/*.yaml', 'example_models/urban_scale/model_config/data/*.rst', 'example_models/national_scale/model_config/data/*.csv', 'example_models/national_scale/model_config/data/*.yaml', 'example_models/national_scale/model_config/data/*.rst', 'test/common/t_time/*.csv', 'test/common/t_constraints_from_file/*.csv', 'test/common/t_1h/*.csv', 'test/common/t_positive_demand/*.csv', 'test/common/t_6h/*.csv', 'test/common/t_erroneous/*.csv']
running develop
running egg_info
writing calliope.egg-info/PKG-INFO
writing dependency_links to calliope.egg-info/dependency_links.txt
writing entry points to calliope.egg-info/entry_points.txt
writing requirements to calliope.egg-info/requires.txt
writing top-level names to calliope.egg-info/top_level.txt
reading manifest file 'calliope.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'calliope.egg-info/SOURCES.txt'
running build_ext
Creating /home/aleroy/miniconda3/lib/python3.6/site-packages/calliope.egg-link (link to .)
Removing calliope 0.5.3 from easy-install.pth file
Adding calliope 0.5.3 to easy-install.pth file
Installing calliope script to /home/aleroy/miniconda3/bin

Installed /home/aleroy/Calliope/calliope
Processing dependencies for calliope==0.5.3
Searching for ruamel_yaml>=0.11
Reading https://pypi.python.org/simple/ruamel_yaml/
No local packages or working download links found for ruamel_yaml>=0.11
error: Could not find suitable distribution for Requirement.parse('ruamel_yaml>=0.11')

Apparently it can't find ruamel, even if it is installed (I could import ruamel.yaml in python). Did anyone have the same problem or an idea how to solve it?

Thanks, Arnaud

brynpickering commented 7 years ago

I had the same issue on a Windows device actually, although now I don't remember how I dealt with it. I'll start a new issue on ruamel_yaml as I don't think it is specific to the new branch.

arnaud-leroy commented 6 years ago

I installed it again and it worked perfectly... 👍

GraemeHawker commented 6 years ago

Is this the performance improvement that was listed against 0.5.4? Happy to help test further if not

brynpickering commented 6 years ago

Yep, we have gone a step further for our development of 0.6.0, to remove even more superfluous variables that might cause Pyomo to spend more time/memory than is absolutely necessary. It still can blow out of proportion though, if there are a lot of technologies at a lot of locations.

The next test we'd like to try is with the Julia programming language building the problem, not Pyomo. But, this is proving tricky due to it being a more nuanced language to tap into efficiency improvements. Once we have that communicating properly with Calliope then we'll open it up to be tested! Currently, it's in development here: https://github.com/calliope-project/calliope/tree/ng/calliope/backend/julia

brynpickering commented 6 years ago

Improving model solution time is a constant topic in this field, which is framework independent really. Ideas for improving model run time can be found in the documentation

calliope-project / calliope

Memory consumption and computational time #69