I am using Julia v. 1.6.3 and I have a problem using JLD to save a RandomForestClassifier() model, trained with ScikitLearn. Namely, when the number of features and labels are too large, I get a segmentation fault error.
Here a working example to reproduce the error:
`
using ScikitLearn
using ScikitLearn.Pipelines
using PyCall, JLD, PyCallJLD
using Random
@sk_import ensemble: (RandomForestClassifier)
Hi,
I am using Julia v. 1.6.3 and I have a problem using JLD to save a RandomForestClassifier() model, trained with ScikitLearn. Namely, when the number of features and labels are too large, I get a segmentation fault error.
Here a working example to reproduce the error: ` using ScikitLearn using ScikitLearn.Pipelines using PyCall, JLD, PyCallJLD using Random @sk_import ensemble: (RandomForestClassifier)
working example with 100 features and 100 labels
x_vals=rand(100,45) y_vals=vec(rand([0,1],100,1))
clf_model=RandomForestClassifier(n_estimators=500,bootstrap=true,oob_score=true,n_jobs=-1,class_weight="balanced_subsample",) fit!(clf_model,x_vals,y_vals) oob_score_value = clf_model.oobscore println("Oob score: $oob_score_value")
JLD.save("clf_model_100.jld", "clf_model", clf_model)
NOT working example with 10,000 features and 100,000 labels
x_vals=rand(10000,45) y_vals=vec(rand([0,1],10000,1))
clf_model=RandomForestClassifier(n_estimators=500,bootstrap=true,oob_score=true,n_jobs=-1,class_weight="balanced_subsample",) fit!(clf_model,x_vals,y_vals) oob_score_value = clf_model.oobscore println("Oob score: $oob_score_value")
JLD.save("clf_model_10000.jld", "clf_model", clf_model) ` Here the error:
Could anyone help understanding what is going on?
UPDATE: I have downgraded Julia to v. 1.0.5 and this has solved the segmentation fault for the working example, although I get the following warning: