cstjean / ScikitLearn.jl

Julia implementation of the scikit-learn API https://cstjean.github.io/ScikitLearn.jl/dev/
Other
546 stars 75 forks source link

Importing LabelEncoder without loading ScikitLearn #62

Closed smsinks closed 4 years ago

smsinks commented 4 years ago

I am taking some datasets, cleaning it and encoding categorical variable using tools that are available in ScikitLearn then running XGBoost of the clean data.

However, I cannot make predictions using the trained XGBoost model because both ScitkitLearn and XGBoost have a function named predict. Refer to the error message below:

WARNING: both ScikitLearn and XGBoost export "predict"; uses of it in module Main must be qualified ERROR: LoadError: UndefVarError: predict not defined

The problem is that I can not define the predict function for XGBoost as XGBoost.predict because this does not work and it seems to be the only solution that I know of.

Further, I cannot find or understand how I can load only the LabelEncoder modules from ScikitLearn without loading ScikitLean and thus the predict function. e.g, the formats

using ScikitLearn:LabelEncoder import ScikitLearn:LabelEncoder import ScikitLearn:Preprocessing,LabelEncoder

All do not work.

Looking forward to your help.

alexmorley commented 4 years ago

Could you try create a minimal working example for this? Further I'm not sure what you mean by

The problem is that I can not define the predict function for XGBoost as XGBoost.predict because this does not work and it seems to be the only solution that I know of.

If you qualify all of your uses of predict like this it should work fine. What error do you get in this case?

smsinks commented 4 years ago

@alexmorley

Thanks for the response. Please refer to the minimal code I have pasted below.

println("\n Running model \n")
using XGBoost
using ScikitLearn

# The commented codes below will not work to import LabelEncoder alone
# using ScikitLearn:Preprocessing.LabelEncoder
# import ScikitLearn:Preprocessing.LabelEncoder
# import ScikitLearn:LabelEncoder
# using ScikitLearn.LabelEncoder

# but this works
using ScikitLearn: fit_transform!

# dummy dataset
xTrain = rand(100,5)
yTrain = vec(rand(1,100))

param = ["booster"=>"gblinear", "eta"=>1, "silent"=>0,
         "objective"=>"reg:linear","eval_metric"=>"rmse"]

# Fit Model
num_round = 2
bst = xgboost(xTrain, label = yTrain, num_round, param = param)

# Predict: THIS WILL NOT WORK BECAUSE OF ScikitLearn
pred = predict(bst, xTrain)
smsinks commented 4 years ago

I have figured it out. instead of trying to import specific modules from ScikitLearn -- which seem to be hard.

I simply specified the predict function as follow:

using XGBoost
using XGBoost:predict
using ScikitLearn

This takes care of my ScikitLearn problems

alexmorley commented 4 years ago

good to hear