Open partyom opened 4 years ago
We definitely want to keep the ability to run the scoring function in parallel - not all users of the package are tuning hyperparameters that allow multithreading. One option would be to allow parallel processing separately for the scoring function and the GP optimization.
I think the best solution would be to automatically set up parallelization if iters.k > 1, and have a second function parameter which allows the GP optimization to be run in parallel. Need to think through it though.
My datasets are relatively small hence lightgbm internal parallelization doesn't bring any substantial gain, and most time is spent in BayesOpt steps. Splitting the parallel processing can be indeed a way out.
Running bayesOpt package with parallel option set to true on R lightgbm package on Windows / R version 4.0 results in
Error in unserialize(socklist[[n]]) : error reading from connection
.It seems to be the problem with lightgbm package as
foreach(i
= 1:2, .packages="lightgbm") %dopar%crashes with the same error, whereas
foreach(i = 1:2, .packages="lightgbm") %do% ` finishes without a problem. Related problem has also been previously reported for this package.As a feature idea, it might be worth considering a possibility of keeping the lightgbm (or any other model) calculation in the main thread as most of them come with internal parallelization logic, while fitting Gaussian process & optimization on multiple cores.
Best regards, Artyom
1) https://github.com/microsoft/LightGBM/issues/1238 2) https://lightgbm.readthedocs.io/en/latest/FAQ.html#lightgbm-hangs-when-multithreading-openmp-and-using-forking-in-linux-at-the-same-time