AnotherSamWilson / ParBayesianOptimization

Parallelizable Bayesian Optimization in R
104 stars 18 forks source link

Running lightgbm with doParallel fails #20

Open partyom opened 3 years ago

partyom commented 3 years ago

Running bayesOpt package with parallel option set to true on R lightgbm package on Windows / R version 4.0 results in Error in unserialize(socklist[[n]]) : error reading from connection.

It seems to be the problem with lightgbm package as foreach(i = 1:2, .packages="lightgbm") %dopar%crashes with the same error, whereasforeach(i = 1:2, .packages="lightgbm") %do% ` finishes without a problem. Related problem has also been previously reported for this package.

As a feature idea, it might be worth considering a possibility of keeping the lightgbm (or any other model) calculation in the main thread as most of them come with internal parallelization logic, while fitting Gaussian process & optimization on multiple cores.

Best regards, Artyom

1) https://github.com/microsoft/LightGBM/issues/1238 2) https://lightgbm.readthedocs.io/en/latest/FAQ.html#lightgbm-hangs-when-multithreading-openmp-and-using-forking-in-linux-at-the-same-time

AnotherSamWilson commented 3 years ago

We definitely want to keep the ability to run the scoring function in parallel - not all users of the package are tuning hyperparameters that allow multithreading. One option would be to allow parallel processing separately for the scoring function and the GP optimization.

I think the best solution would be to automatically set up parallelization if iters.k > 1, and have a second function parameter which allows the GP optimization to be run in parallel. Need to think through it though.

partyom commented 3 years ago

My datasets are relatively small hence lightgbm internal parallelization doesn't bring any substantial gain, and most time is spent in BayesOpt steps. Splitting the parallel processing can be indeed a way out.