ACCESS-NRI / MED-condaenv

A repository for the squashfs'd MED conda environments
Apache License 2.0
0 stars 0 forks source link

Time cost issue while using `multiprocessing` in condaenv #93

Open rhaegar325 opened 1 month ago

rhaegar325 commented 1 month ago

Hi, @rbeucher and @Dsroberts @truth-quark:

I have some problems while using multiprocessing on modules in hh5 and xp65. my code running good on my local mamba env in kj13, but when I use module in hh5 and xp65, it's running really slow. after I test, it do running parallelly, but in each process, it pretty slow than it runs in my local env. follow are some output put of my test, make it easier to under.

This is time cost when it runs in my local mamba env:

Screenshot 2024-08-13 at 12 19 09 pm

I divide the process into two part, first part is load data from .nc file, second part is to convert data format. the output shows the max, min and average time cost of first part, second part and the whole process.

and following is the same code run on hh5/public/modules/conda_concept/analysis3-24.01 and xp65/access-med-0.8.

Screenshot 2024-08-13 at 12 28 22 pm Screenshot 2024-08-13 at 12 28 07 pm

it's way to slow than it should be. And I also make a test to run sequentially on hh5 module: Screenshot 2024-08-13 at 12 32 32 pm it works normally, so I think there might be some problem in running multiprocessing, do you have any idea about this, really appreciate if you could have a look.

dsroberts commented 1 month ago

Hi @rhaegar325. This is fairly unusual, I've not ever seen this much of a performance regression with these environments. There is an additional overhead of launching the initial python process due to also launching a container, but once that process is running, all subsequent processes launched by multithreading are in the container, and so shouldn't take any longer to launch than any if they were launched from any other distribution. Could you please upload your script somewhere so I can take a look?

rhaegar325 commented 1 month ago

Hi @dsroberts, thanks for your reply. It would be great if you have time to look into my code, here is the link to my script: (https://github.com/ACCESS-NRI/MED-utils/blob/main/access_med_utils/CMORise.py), more specificly, the function running in subprocess was there (https://github.com/ACCESS-NRI/MED-utils/blob/6f0693fd453f1177ffc3483398e5521fa5fd353a/access_med_utils/CMORise.py#L202), and the multiprocessing function part was there: (https://github.com/ACCESS-NRI/MED-utils/blob/6f0693fd453f1177ffc3483398e5521fa5fd353a/access_med_utils/CMORise.py#L365). hope that will help you jump to the point quicker.

after those days test, I found that even the multiprocessing.pool create multiple processes, the cpu_time usage are never higher than wall_time. so I suspect there might be something blocked those processes, not sure it was the io or other parts.

dsroberts commented 4 weeks ago

Hi @rhaegar325. Its hard to tell without actually running it myself, but my initial suspicion is that this line (https://github.com/ACCESS-NRI/MED-utils/blob/main/access_med_utils/CMORise.py#L418) is involved. You're running pool_process in a loop, which is creating and destroying a multiprocessing pool for every path in s_dic.keys(). This is a very expensive operation, You're far better off flattening the s_dic.keys() loop into the file_set list. Something like:

file_set = [ j for sub in [ glob.glob(non_cmip_path+path) for path in s_dic.keys() ] for j in sub ]

Then do

result = pool_process(mp_newdataset, file_set)

On the larger file_set. This has the advantage of only creating the multiprocessing Pool once. Once that's done, I'd be interested to see the difference between the two environments.

rhaegar325 commented 4 weeks ago

Really appreciate for your suggestion @dsroberts , I will have a try first.

rhaegar325 commented 3 weeks ago

Hi, @dsroberts, Thanks for your advise for multiprocessing.Pool, I have update this part and the new version was in branch update CMORise.py and waiting for been merged.

However, the issue was still there, I tried couple of ways, the script do generate multiple processes but it seems those processes was blocked in some part and I don't know exactly where.