Closed leonardtschora closed 4 years ago
Thanks for the write-up! Maybe I'm misunderstanding, or maybe the docs are not clear, but why didn't you just implement the ScikitLearnBase interface? If you don't need Python models, then your code can really be 100% Python-free, that was the goal.
Hum indeed it seems way more simple to juste use ScikitLearnBase.jl ... I don't recall seeing ScikitLearnBase.jl bein mentionned in ScikitLearn.jl documentation.
For the records, the above code translate to this: (which is way less ugly):
import ScikitLearnBase: fit!, transform, predict
using ScikitLearnBase: get_params, set_params!, @declare_hyperparameters, BaseRegressor
f(X, a, b) = @. a * X + b
mutable struct SKBEstimator <: BaseRegresor
a::Int
b_::Float64
SKBEstimator(; a=0) = new(a)
end
@declare_hyperparameters(SKBEstimator, [:a])
function fit!(model::SKBEstimator, X, y)
model.b_ = sum(X) / length(X)
return model
end
function transform(model::SKBEstimator, X)
f(X, model.b_, model.a)
end
function predict(model::SKBEstimator, X)
return transform(model, X)[:, 1]
end
X = zeros(Int, 30, 2)
y = zeros(Int, 30)
skbt = SKBEstimator()
fit!(skbt, X, y)
transform(skbt, X)
# Try it in a pipeline
pipe = Pipeline([("Dummy", skbt)])
fit!(pipe, X, y)
transform(pipe, X)
# Try geting params
get_params(skbt)
set_params!(skbt; a=2)
get_params(skbt)
# Try score & predict
predict(skbt, X)
score(skbt, X, y)
# Try putting in a grid search
grid = GridSearchCV(SKBEstimator(), Dict(:a => [-2, 3, 0, 40]))
fit!(grid, X, y)
grid.best_params_
And now a quick question: what is the purpose of "Inheriting" BaseClassifier
vs defining is_classifier
? And more generaly, which default method do BaseEstimator
, BaseRegressor
and BaseClassifer
implements? (I think I spoted the score
function for BaseRegressor
.
Thanks a lot
Yeah, you don't particularly need to inherit from these classes.
What do you mean? These classes actually do nothing?
Well, they do what you said: specify is_classifier
and score
to give good defaults. If you can, inherit, but if you can't (because you want to inherit from something else), then it's not a big problem to write is_classifier(::YourEstimator)
Okay thanks :)
Hi everyone, this post is an issue with its solution but I would first like to share it with other people and second get insights about what I did wrong in my code.
As the title says, this is about creating Scikit Learn estimators using exclusively Julia programming.
Why?
Because I have strong feature extraction tools and models developped in Julia. I spend time optimizing the computations and I don't want to roll back to Python after than. However, I need ScikitLearn's tools for model selection (grid search, cv, etc...) and pipelines. Estimators are particularly handy for my use case and I need to be able to wrap my models, feature extraction tools in estimators to be able to use them efficiently.
How?
There were 2 solutions : -Develop estimators in Python and use pyjulia to call my julia core code -Use PyCall and especially the
@pydef
macro to write Python classes in Julia.I selected the first option because this keeps my entiere project without any part writen in Python.
The code
The main difficulties arose from (at least I suppose they do) problems from passing from Julia to Python and Python to Julia. For instance, the
get_params()
andset_params()
functions will not work properly (I already described this issue here). You can check theLeakyTransformer
class to experience this error.To solve most of these problems, I created boilerplate class for all Julia defined estimators :
JuliaEstimator
, which redefinesget_params()
andset_params()
and propose a default constructor.To finish with, you can try the class
FunctionTransformer
which wraps a julia function in an estimator and can be fully used in grid searches and pipelines.Conclusion
I have successfully implemented and used an estimator containing Julia code. Some of the design choices I made are highly questionable (kwargs getion for instance) but I hope that at least this bit of code will provide a solid example of how to create sklearn estimators in Julia.