ecpolley / SuperLearner

Current version of the SuperLearner R package
271 stars 72 forks source link

Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, :cannot open the connection #124

Closed jedaniels-ucd closed 5 years ago

jedaniels-ucd commented 5 years ago

Recurring problem where SuperLearner (CV.SuperLearner function specifically), no matter how many mc.cores are specified, continues to spawn workers until the entire system dies under the weight of the jobs. Replicated this under several builds using different LINUX distributions (Redhat, Amazon, Ubuntu). Was able to accidentally build one machine that works properly, the rest all failed. Manually creating, running, stopping workers interactively in R seems to work fine.

ecpolley commented 5 years ago

Which algorithms did you include in the SL.library? In your CV.SuperLearner call, did you set the value for the parallel argument or use the default?

jedaniels-ucd commented 5 years ago

Algorithms appear to include: SL.library <- list("SL.mean", "SL.gbmmini2", "SL.nnet", "SL.glmnet", "SL.bayesglm", "SL.xgboost", "SL.gbmmini3")

CV.Superlearner is using parallel=”multicore” although the same error occurs whether you specify parallel or not.

[snip]

ecpolley commented 5 years ago

I suspect it is one of the algorithms in the library causing the problem, if you remove the gbm algorithms (gbmmini[2|3] and xgboost) does it still give the error?

jedaniels-ucd commented 5 years ago

Removing gbm* and xgboost does stabilize behavior. From a substantive point of view I am not sure of the consequences of removing these algorithms. Any suggestions for trying to salvage them if it becomes necessary? The one job that ran successfully might have just been a lucky seed or something?

jedaniels-ucd commented 5 years ago

Addendum: Just noticed gbmmini2/3 are custom algorithms, so that sounds more like the PI's problem than an actual SuperLearner bug.

ecpolley commented 5 years ago

Might need to look at the content of the gbmmini2/3 algorithms to see if they are spawning multiple jobs and if that can be restricted (you can test if they are in fact the issue by adding back in just the xgboost). It does sound like one of the algorithms is causing the problem, not he SuperLearner code. In this case, you might be able to modify the custom algorithm to control it.