EcoExtreML / Emulator

Apache License 2.0
0 stars 1 forks source link

Problems during parallel computing with Dask #4

Open QianqianHan96 opened 1 year ago

QianqianHan96 commented 1 year ago

Based on the progress we got on June 27th, we managed to predict 100 timesteps with dask. I also managed to reduce the size of trained RF model from 15 GB to 245 MB. I can predict 200 timesteps (with 240GB memory). However, when I tried to predict 1500 timesteps (the data of 1500 timesteps is 297 MB.), always hit the worker memory limit (240 GB memory), no matter how many workers I use (4/32/64), the threads_per_worker is always 1. When I requested 960 GB, still hit the worker memory limit. When I use my python script (https://github.com/EcoExtreML/Emulator/blob/main/1computationBlockTest/2read10kminput-halfhourly-0616.py) without Dask to predict the whole year 17000 steps, it used 100 GB memory. I do not understand why Dask need so much memory. Could you help give some advice in terms of this problem? The script is at https://github.com/EcoExtreML/Emulator/blob/main/1computationBlockTest/2read10kminput-halfhourly-0628.ipynb. image

image

SarahAlidoost commented 1 year ago

When I use my python script without Dask to predict the whole year 17000 steps, it used 100 GB memory.

Please add a link to your python script in this issue.

SarahAlidoost commented 1 year ago

The script is in 2read10kminput-halfhourly-0628.ipynb.

Please add a link to the notebook 2read10kminput-halfhourly-0628.ipynb in this issue.

QianqianHan96 commented 1 year ago

When I use my python script without Dask to predict the whole year 17000 steps, it used 100 GB memory.

Please add a link to your python script in this issue.

Thanks for your advice, Sarah, I added it.

QianqianHan96 commented 1 year ago

The script is in 2read10kminput-halfhourly-0628.ipynb.

Please add a link to the notebook 2read10kminput-halfhourly-0628.ipynb in this issue.

Thanks for your advice, Sarah, I added it.

QianqianHan96 commented 1 year ago

I have been trying to figure out from 3 points: 1) loading data, 2) data preprocessing including temporal and spatial resampling, 3) RF model. I found the data loading and preprocessing do not have problems, but the RF model caused the memory problem.

When I load the RF model outside of map_block function and then pass this model to map_blocks, the unmanaged memory is extremely high. Like what I said in the first post in this issue: when I tried to predict 1500 timesteps (the data of 1500 timesteps is 297 MB.), but always hit the worker memory limit (240 GB memory), I checked again that the memory is mostly unmanaged memory.

After I changed to pass the model path to map_blocks function instead of the model, the unmanaged memory seems normal, similar as managed memory. But the RF model is 245MB, I can not understand why pass it can cause so much unmanaged memory?

QianqianHan96 commented 1 year ago

This experiment is still for 5 degree area. Now predicting 2000 timesteps and 5000 timesteps is no problem. But when I tried to predict 10000 timesteps, it failed before I increase either the memory or the CPU number. When I tried to predict 17000 timesteps, it failed before I increase both the memory and the CPU number. I have two questions: (1) When the code was running, the dask dashboard showed the memory was not so high, but sometimes the unmanaged memory is high. Why do we need 480 GB memory for 3 GB input data (17000 timesteps)? (2) When I predict for 2000 timesteps, I tried to increase the workers from 4 to 8, the running time was faster. However, when I predict for 5000 timesteps, it failed when I tried to use 8 workers. Why this? If it is like this, the running time can not be faster?

image

SarahAlidoost commented 1 year ago

@geek-yang and @fnattino see the issue here

QianqianHan96 commented 1 year ago

@geek-yang and @fnattino, Hi Yang, Francesco, I prepared the two jupyter notebooks. The code is in github now: 1) 2000 timesteps: https://github.com/EcoExtreML/Emulator/blob/main/2daskParallel/2read10kminput-halfhourly-0904_2000steps.ipynb. 2) 1 year: https://github.com/EcoExtreML/Emulator/blob/main/2daskParallel/2read10kminput-halfhourly-0904_1year.ipynb. I put the potential reasons in the beginning of the 1 year notebook. Please let me know if anything is not clear. Thanks for your help!

QianqianHan96 commented 1 year ago

@geek-yang and @fnattino Hi Yang, Francesco,

I prepared the two jupyter notebooks. The code is in github now: 1) 1 year in 10 degree area: https://github.com/EcoExtreML/Emulator/blob/main/2daskParallel/0921_1year_10degree.ipynb. 2) 1 year in Europe: https://github.com/EcoExtreML/Emulator/blob/main/2daskParallel/0921_1year_Europe.ipynb. Good news is I managed to make the script run in 10 degree area. Bad news is Europe area is not working for now.

For 10 degree area, although I managed to make it run, maybe it would be better if you can help me check is the script correct or not? Specifically, I have the following 4 questions:

  1. In Dask’s dashboard: how to reduce the rechunk tasks? For 5 degree and 10 degree data, I chunk along time into 9 chunks. Why 5 degree data has 170 rechunk tasks, but 10 degree data has 1520 rechunk tasks? And I tried to chunk 10 degree data both in time and space, but the rechunk tasks still 1520. I do not know how to reduce the rechunk tasks, or with 10 degree data (8 GB), 1520 rechunk tasks is minimum?
  2. ERA5Land’s data automatically chunk in space, from 101 pixels into 95 and 6. Other data did not have this. So I chunk ERA5Land by hand into a whole piece (101*101 pixels) before map_block function, which is in ds_ss variable. Could you help me check is this okay?
  3. When I only chunk in time, the running time for 5 degree data is 47 seconds, for 10 degree data is 367 seconds. After I chunk the 10 degree data in space (9 chunks in time * 4 chunks in space), the running time became 293 seconds. The above running time all used 4 workers and 2 threads. The running time of 10 degree area became 207 seconds when I used 8 workers.
  4. As I understood, chunk on space is different from time, because dynamic variables and static variables both need to be chunked in space. How to make sure the chunks in the same order inside map_block function? I select the spatial extent for static variables inside function "predictFlux" based on the spatial extent of dynamic variables. Could you help me check this part?

For Europe area, I did not manage to make it run. The error is in cell 55 of https://github.com/EcoExtreML/Emulator/blob/main/2daskParallel/0921_1year_Europe.ipynb. It seems it is data size problem, but even when I tried to predict 151 151 pixels (10 degree is 101101 pixels), it gave me same error.

All the input data is on snellius, you can directly run my script on snellius. I am using fat node and 32 CPU, 240 GB memory.