cstjean / ScikitLearn.jl

Julia implementation of the scikit-learn API https://cstjean.github.io/ScikitLearn.jl/dev/
Other
547 stars 75 forks source link

ScikitLearn.jl - JLD.jl segmentation fault error #104

Open CBongiova opened 2 years ago

CBongiova commented 2 years ago

Hi,

I am using Julia v. 1.6.3 and I have a problem using JLD to save a RandomForestClassifier() model, trained with ScikitLearn. Namely, when the number of features and labels are too large, I get a segmentation fault error.

Here a working example to reproduce the error: ` using ScikitLearn using ScikitLearn.Pipelines using PyCall, JLD, PyCallJLD using Random @sk_import ensemble: (RandomForestClassifier)

working example with 100 features and 100 labels

x_vals=rand(100,45) y_vals=vec(rand([0,1],100,1))

clf_model=RandomForestClassifier(n_estimators=500,bootstrap=true,oob_score=true,n_jobs=-1,class_weight="balanced_subsample",) fit!(clf_model,x_vals,y_vals) oob_score_value = clf_model.oobscore println("Oob score: $oob_score_value")

JLD.save("clf_model_100.jld", "clf_model", clf_model)

NOT working example with 10,000 features and 100,000 labels

x_vals=rand(10000,45) y_vals=vec(rand([0,1],10000,1))

clf_model=RandomForestClassifier(n_estimators=500,bootstrap=true,oob_score=true,n_jobs=-1,class_weight="balanced_subsample",) fit!(clf_model,x_vals,y_vals) oob_score_value = clf_model.oobscore println("Oob score: $oob_score_value")

JLD.save("clf_model_10000.jld", "clf_model", clf_model) ` Here the error:

signal (11): Segmentation fault: 11 in expression starting at /Users/admin/Desktop/Online2/Train_ML_new.jl:998 jl_exit_thread0_cb at /Applications/Julia-1.6.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.dylib (unknown line) Allocations: 86747943 (Pool: 86717037; Big: 30906); GC: 83

Could anyone help understanding what is going on?

UPDATE: I have downgraded Julia to v. 1.0.5 and this has solved the segmentation fault for the working example, although I get the following warning:

┌ Warning: JLD incorrectly extends FileIO functions (see FileIO documentation) └ @ FileIO ~/.julia/packages/FileIO/DNKwN/src/loadsave.jl:217