Open mratsim opened 7 years ago
Hi, thank you for filing an issue about this. That's definitely a bug. I think that DataFrames have never been tested as input to grid-search. I just removed the AbstractArray
type. Could you please try it out again? (Pkg.checkout("ScikitLearn"))
I'll have more time to look into it tomorrow.
Pull requests are welcome.
It looks like this isn't possible with scikit-learn in Python either. See https://github.com/paulgb/sklearn-pandas/issues/61. Some proposed solutions in https://github.com/paulgb/sklearn-pandas/pull/62 and https://github.com/paulgb/sklearn-pandas/pull/64.
The primary challenge is to implement get_params/set_params
for DataFrameMapper
. Here's the code I used to test it:
using DataFrames: DataFrame
using ScikitLearn
using ScikitLearn.GridSearch: GridSearchCV
@sk_import ensemble: RandomForestClassifier
@sk_import preprocessing: StandardScaler
X_train = DataFrame(Any[randn(100), randn(100)], [:a, :b])
Y_train = rand(0:1, 100)
mapper = DataFrameMapper([([:a, :b], StandardScaler())])
pipe = Pipelines.Pipeline([
("featurize", mapper),
("forest", RandomForestClassifier(n_estimators=200))
])
# GridSearch
grid = Dict(:forest__n_estimators => 10:30:240)
gridsearch = GridSearchCV(pipe, grid)
fit!(gridsearch, X_train, Y_train)
println("Best hyper-parameters: $(gridsearch.best_params_)")
Hello again Cédric,
Following your help on transformer I am now trying to use a GridSearch to optimize the hyperparameters of a RandomForest.
I have a pipeline with lots of transformer which works great with Cross Validation and actual prediction, however I get a type error when trying to use it in a GridSearchCV, it seems like there is an extra argument of type ScikitLearn.Skcore.ParameterGrid in my setup :
The error I get is :
So the proc is receiving _fit!(::ScikitLearn.Skcore.GridSearchCV, ::DataFrames.DataFrame, ::Array{Int64,1}, ::ScikitLearn.Skcore.ParameterGrid) but expecting an array instead of a Dataframe. The thing is it should have been converted away by the DataFrameMapper.
If needed the full code is there https://github.com/mratsim/MachineLearning_Kaggle/blob/9c07a64a981a6512e021ae01623212a278fd05d1/Kaggle%20-%20001%20-%20Titanic%20Survivors/Kaggle-001-Julia-MagicalForest.jl#L530