Open Stevogallo opened 6 months ago
Hello @Stevogallo , thanks for your issue opening. The speed of the parallel processing might depend on local configuration. Would you be able to share your usecase so that I can perform tests on my computer? In this case your CPU and RAM on your laptop might be of interest as well if you are ok to share this information.
Hi @Bachibouzouk, I'm using the default "Input File 1". I'm working on a jupyter notebook we created: https://github.com/SESAM-Polimi/RAMP-Jupyter because it turns much easier to teach in class.
My laptop: Processor 13th Gen Intel(R) Core(TM) i7-1355U, 1700 Mhz, 10 Core(s), 12 Logical Processor(s) Installed Physical Memory (RAM) 16.0 GB Windows 11 Pro
Let me know if you need something else.
Hi @Stevogallo - thanks for the informations, do you use one of the two notebooks under ramp/Jupyter Notebooks
in the repository you provided?
After a quick research, it seems other people have also issues with multiprocessing, jupyter and windows. Could you check running a simple code within a python script to test if the problem could be jupyter?
Sorry @Bachibouzouk, I didn't specify. I'm using "\RAMP-Jupyter-main\RAMP-Jupyter-main\ramp\Jupyter Notebooks\RAMP Example Village - Excel.ipynb"
I'll try without jupyter and let you know.
@Bachibouzouk an update: I runned Input_file_1 from Visual Studio, here the results: Time taken by generate_daily_load_profiles_parallel: 950.19 seconds Time taken by generate_daily_load_profiles: 105.32 seconds
@Stevogallo - I also tested locally and the parallel processing takes more time. As the multiprocessing version is not pinned, it can be due to newer version of multiprocessing. In the profiling the acquire
method of the _threads.lock
object takes up 50% of the time. I unfortunately don't have the resources to investigate this at the moment :(
I still was puzzled by this and looked at it more closely: it is depending on the chunksize
parameter (how many tasks get attributed as a batch to a CPU) quite a lot. Now the latter is fixed to 4. I tried to fiddle with it a bit and I could get to run the code in 3 seconds with // processing versus 17 seconds without // processing. The problem is that it depends on the size of the task list, and this depends on how many days, appliances and user we have. I will try to find a compromise. Meanwhile you can try to fiddle with the parameter yourself @Stevogallo
I was testing the newly added option "generate_load_profiles_parallel", that according to your comments should be faster. But I'm experiencing tenfold computational times, and double RAM use, compared to when I use "generate_load_profiles", the old function.