parallelization of calibration

jonasklingebiel1 commented 2 years ago

Enabling parallel computing will accelerate calibration process.

SciPy Differential Evolution is able to parallelize optimization. Please integrate and test the function and create an example. Therefore simply adjust an existing example.

I will assign Jonas Betscher to this issue asap.

@FWuellhorst Do you have any comment/advice?

FWuellhorst commented 2 years ago

Yes, I think scipy needs to have a pickable function. As this is not possible using our FMU and Dymola API, we can only parallelize the API directly. I think pymoo could handle our way of parallelization. The scipy way would result in DDOS attacks on our license server.

JBetscher commented 2 years ago

Solution

Multiprocessing was achieved by using pymoo as a solver. When using mp, the entire population is handed over to the new method mp_obj in the calibrator class. Here the population is devided into chunks of the same size as the chosen number of CPUs for computing. In the same loop a list with the new parameters is created and handed over to the new method multi_simulate of the SimulationAPI class. This method starts mutliple simultaneous simulations by using pool.starmap of the simulate method. When all simulations are completed a list with the results is returned to mp_obj and logging of the results starts in queue. Having gone through the whole population, objectives are returned to the solver and the next population is presented to start this process all over.

Multiprocessing works for both FmuAPI and DymolaAPI.

This has changed for the user

A new example named e6_multiprocessing_calibration_example has been added and is based on the example e4. Changes are the presented variable n_cpu and new kwargs for using pymoo as a solver.

When using pymoo the chosen framework must be pymoo and the method is the used algorithm. Right now the preset algorithm is GA (Genetic Algorithm). Changing this will require getting to know the hyperparameters of other algorithms and therefore changing the kwargs.

The README has also been updated, noting that pandas==1.3.5 and tables==3.6.1 have to be used in order to work.

Speed advantage

The speed advantage is highly dependent on the simulation time, as logging does not happen in parallel. Having in mind that simulation time is usually higher by a large number of multiples of the logging time, the speed advantage correlates directly with the used number of CPUs. For example a calibration using n_cpu = 4 should take around a fourth of the time of one using n_cpu = 1.

jonasklingebiel1 commented 2 years ago

Hi @FWuellhorst, I checked the new code and tested its functionality. It worked for all tested scenarios. Do you have any further ideas for improvement?

github-actions[bot] commented 2 years ago

Branch 20-parallelization-of-calibration created!

FWuellhorst commented 2 years ago

@jonasklingebiel1 @JBetscher First of all thanks for this really nice contribution! Do you have 30 mins in one of the next days to talk about the changes especially in ebcpy's simulation API? :) I think this is more efficient compared to review in github.

JBetscher commented 2 years ago

@FWuellhorst

Yes, I'll message you via slack to find time for a call.

RWTH-EBC / AixCaliBuHA