EcoExtreML / Emulator

Apache License 2.0
0 stars 1 forks source link

one task does not run only on one core #2

Open QianqianHan96 opened 1 year ago

QianqianHan96 commented 1 year ago

When I train the RF model with one core, it takes 1 hour. If I set n_jobs=-1, it use all cores, and it takes 9 mins. There is very different during training. However, when I predict, there is no difference, the predicting time is same for both trained model, and they both not only use one core (more than 10 cores are running).

image

QianqianHan96 commented 1 year ago

I realized that why during predicting it uses not only 1 core. The problem is not the trained model (predict() function), but before predict() function during the array data prepare (see screenshot1) for predict() function. Because I printed the running time for array data prepare and predict() function, with two models, the running time of predict() function is different in two models, which means the model is different, but the time of array data prepare is same and takes up most of the time (e.g., when I predict 4 timesteps, predicting takes 0.1 s for every timestep, array data prepare takes 4.8 s, the total time is 19 s, see screenshot 2).
I do not know why the script in screenshot1 make use of more than 20 cores, and I do not know how to control it. But in my opinion, this part is same for every spatial unit and timestep, we can just ignore it or we still want to make everything run on one core? image image

geek-yang commented 1 year ago

About multi-cores utilization in your data preparation. numpy is written in c++ and it enables parallel in the backend (that is to say, numpy is not constrained by GIL of python, that's why its performance is awesome!). In this case, you convert your array to numpy and performs numpy array operations like reshape and concatenate, so it is not surprising that numpy uses multiple cores to accelerate your computation.