Closed SimonCrouzet closed 1 year ago
Thank you @SimonCrouzet for submitting this issue. I'd still need to look further into this to understand why your example triggered the error you show, however can you test with this branch: https://github.com/IQVIA-ML/TreeParzen.jl/tree/test-tell-bug and tell me whether it sorts the issue?
Thanks @kainkad for the quick reply! I tried with your test-tell-bug branch and it's now working
Let me know if I can be of any help for further investigation
Hi @SimonCrouzet.
After investigating the tell
signature issue further, I think it is not necessary to change the signature in the TreeParzen source code and you can actually use it as it is without running into a similar issue with the tell
signature. The problem seemed to have been related to some interaction between dependencies which affected XGBBoostClassifier. Removing Manifest.toml
from the project and then letting the package manager to instantiate all dependencies (julia –project
, ] instantiate
) seemed to have resolved the issue.
I managed to reproduce the issue you raised with the below example. As I didn’t have access to the same data and some variables you provided, I used the following which were enough to trigger the same error you got:
XGBoostClassifier
from MLJXGBoostInterface
import
rather than using
so that it is clear from which package the functions were imported. X
to a DataFrame
to match your example but the same was triggered with X
being a NamedTuple
. y
in this example is CategoricalArrays.CategoricalVector{String, UInt32, String, CategoricalArrays.CategoricalValue{String, UInt32}, Union{}}
(which is slightly different to yours but still enough to trigger the issue)import MLJBase, MLJXGBoostInterface, MLJTuning, TreeParzen, DataFrames
import MLJXGBoostInterface.XGBoostClassifier
X, y = MLJBase.@load_iris
# @load returns:
# X: NamedTuple{(:sepal_length, :sepal_width, :petal_length, :petal_width), NTuple{4, Vector{Float64}}}
# y: CategoricalArrays.CategoricalVector{String, UInt32, String, CategoricalArrays.CategoricalValue{String, UInt32}, Union{}}
# so I have updated X to be DataFrames as in the issue #94 example
# but it works with either X as a NamedTuple or X as a DataFrame
Features = DataFrames.DataFrame(X)
xgbclassifier = MLJXGBoostInterface.XGBoostClassifier()
space = Dict(
:gamma => TreeParzen.HP.LogNormal(:gamma, log(0.5), 3.),
:max_depth => TreeParzen.HP.QuantUniform(:max_depth, 2., 50., 1.0),
:min_child_weight => TreeParzen.HP.QuantUniform(:min_child_weight, 2., 10., 1.),
:max_delta_step => TreeParzen.HP.LogNormal(:max_delta_step, log(1.), 5.),
:lambda => TreeParzen.HP.LogNormal(:lambda, log(0.05), .2),
:alpha => TreeParzen.HP.LogNormal(:alpha, log(0.05), .2),
:max_leaves => TreeParzen.HP.Choice(:max_leaves, [0, 2, 4, 8, 16, 32]),
)
num_cv_folds = 4
rand_trials = 3
simultaneous_draws = 2
StrCV = MLJBase.CV(nfolds=num_cv_folds)
num_models_trained = 25
metrics = MLJBase.LogLoss()
tuned_classifier_model = MLJTuning.TunedModel(
model=xgbclassifier,
tuning=TreeParzen.MLJTreeParzen.MLJTreeParzenTuning(
;random_trials=rand_trials,
max_simultaneous_draws=simultaneous_draws,
linear_forgetting=simultaneous_draws*25
),
resampling=StrCV,
repeats=5,
n=num_models_trained,
range=space,
measure=metrics,
)
tuned_classifier = MLJBase.machine(tuned_classifier_model, Features, y)
MLJBase.fit!(tuned_classifier, verbosity=1)
And I got the below error:
[ Info: Training machine(ProbabilisticTunedModel(model = XGBoostClassifier(test = 1, …), …), …).
[ Info: Attempting to evaluate 100 models.
Evaluating over 20 metamodels: 100%[=========================] Time: 0:00:16
┌ Error: Problem fitting the machine machine(ProbabilisticTunedModel(model = XGBoostClassifier(test = 1, …), …), …).
└ @ MLJBase ~/.julia/packages/MLJBase/97P9U/src/machines.jl:682
[ Info: Running type checks...
[ Info: Type checks okay.
ERROR: LoadError: MethodError: no method matching tell!(::TreeParzen.Trials.Trial, ::Float32)
Closest candidates are:
tell!(::TreeParzen.Trials.Trial, ::Float64) at ~/.julia/packages/TreeParzen/JnYrc/src/API.jl:61
Stacktrace:
[1] (::TreeParzen.MLJTreeParzen.var"#2#3")(entry::NamedTuple{(:model, :measure, :measurement, :per_fold, :metadata), Tuple{XGBoostClassifier, Vector{LogLoss{Float64}}, Vector{Float32}, Vector{Vector{Float32}}, TreeParzen.Trials.Trial}})
@ TreeParzen.MLJTreeParzen ~/.julia/packages/TreeParzen/JnYrc/src/MLJTreeParzen.jl:216
[2] iterate
@ ./generator.jl:47 [inlined]
[3] _collect
@ ./array.jl:695 [inlined]
[4] collect_similar
@ ./array.jl:606 [inlined]
[5] map
@ ./abstractarray.jl:2294 [inlined]
[6] get_trialhist
@ ~/.julia/packages/TreeParzen/JnYrc/src/MLJTreeParzen.jl:209 [inlined]
[7] models(strategy::MLJTreeParzenTuning, model::XGBoostClassifier, history::Vector{NamedTuple{(:model, :measure, :measurement, :per_fold, :metadata), Tuple{XGBoostClassifier, Vector{LogLoss{Float64}}, Vector{Float32}, Vector{Vector{Float32}}, TreeParzen.Trials.Trial}}}, state::NamedTuple{(:space, :trialhist), Tuple{MLJTreeParzenSpace, Vector{TreeParzen.Trials.Trial}}}, remaining::Int64, verbosity::Int64)
@ TreeParzen.MLJTreeParzen ~/.julia/packages/TreeParzen/JnYrc/src/MLJTreeParzen.jl:238
[8] build!(history::Nothing, n::Int64, tuning::MLJTreeParzenTuning, model::XGBoostClassifier, model_buffer::Channel{Any}, state::NamedTuple{(:space, :trialhist), Tuple{MLJTreeParzenSpace, Vector{TreeParzen.Trials.Trial}}}, verbosity::Int64, acceleration::CPU1{Nothing}, resampling_machine::Machine{Resampler{CV}, false})
@ MLJTuning ~/.julia/packages/MLJTuning/ZFg3R/src/tuned_models.jl:644
[9] fit(::MLJTuning.ProbabilisticTunedModel{MLJTreeParzenTuning, XGBoostClassifier}, ::Int64, ::NamedTuple{(:sepal_length, :sepal_width, :petal_length, :petal_width), NTuple{4, Vector{Float64}}}, ::CategoricalArrays.CategoricalVector{String, UInt32, String, CategoricalArrays.CategoricalValue{String, UInt32}, Union{}})
@ MLJTuning ~/.julia/packages/MLJTuning/ZFg3R/src/tuned_models.jl:747
[10] fit_only!(mach::Machine{MLJTuning.ProbabilisticTunedModel{MLJTreeParzenTuning, XGBoostClassifier}, false}; rows::Nothing, verbosity::Int64, force::Bool, composite::Nothing)
@ MLJBase ~/.julia/packages/MLJBase/97P9U/src/machines.jl:680
[11] fit_only!
@ ~/.julia/packages/MLJBase/97P9U/src/machines.jl:614 [inlined]
[12] #fit!#57
@ ~/.julia/packages/MLJBase/97P9U/src/machines.jl:778 [inlined]
[13] fit!(mach::Machine{MLJTuning.ProbabilisticTunedModel{MLJTreeParzenTuning, XGBoostClassifier}, false})
@ MLJBase ~/.julia/packages/MLJBase/97P9U/src/machines.jl:776
[14] top-level scope
@ ~/OpenSource/tell_demo/src/mlj_integration_xbgoostclassifier.jl:31
in expression starting at ~/tell_demo/src/mlj_integration_xbgoostclassifier.jl:31
I have then tried the same data with another classifier to understand whether the same is true and all worked fine.
I used MLJDecisionTreeInterface.DecisionTreeClassifier
with a slightly trimmed space with hyperparameters matching the ones required by the DecisionTreeClassifier and left the rest the same as in the example above.
space = (Dict(
:min_purity_increase => TreeParzen.HP.Uniform(:min_purity_increase, 0.0, 1.0),
:merge_purity_threshold => TreeParzen.HP.Uniform(:merge_purity_threshold, 0.0, 1.0),
))
dtc = DecisionTreeClassifier()
After deleting Manifest.toml
and letting Pkg manager deal with the dependencies with instantiating, the initial XGBoostClassifier example worked fine.
[ Info: Training machine(ProbabilisticTunedModel(model = XGBoostClassifier(test = 1, …), …), …).
[ Info: Attempting to evaluate 25 models.
Evaluating over 3 metamodels: 100%[=========================] Time: 0:00:19
Evaluating over 2 metamodels: 100%[=========================] Time: 0:00:00
Evaluating over 2 metamodels: 100%[=========================] Time: 0:00:00
Evaluating over 2 metamodels: 100%[=========================] Time: 0:00:00
Evaluating over 2 metamodels: 100%[=========================] Time: 0:00:00
Evaluating over 2 metamodels: 100%[=========================] Time: 0:00:00
Evaluating over 2 metamodels: 100%[=========================] Time: 0:00:00
If this helps the status of the project dependencies I tested this on looks like this:
(tell_demo) pkg> status
Project tell_demo v0.1.0
Status `~/OpenSource/tell_demo/Project.toml`
[324d7699] CategoricalArrays v0.10.7
[a93c6f00] DataFrames v1.5.0
[add582a8] MLJ v0.19.1
[a7f614a8] MLJBase v0.21.8
[d491faf4] MLJModels v0.16.4
[03970b2e] MLJTuning v0.7.4
[54119dfa] MLJXGBoostInterface v0.3.7
[eb66a70c] TreeParzen v0.3.2
Would it be possible for you to do the same by deleting the local Manifest.toml
in your project and then letting Pkg
sort out the dependencies and try running your example again? Please use the currently released TreeParzen@v0.3.2
.
If the issue is still present, would it be possible for you to share the status of the dependencies and letting me know your Julia version as well.
Hello @kainkad
Seems like it's working yes! Don't know exactly from which dependencies the problem was coming from, but switching back to v0.3.2 and deleting Manifest.toml
resolved my issue. Thanks for your help!
Hello @kainkad Seems like it's working yes! Don't know exactly from which dependencies the problem was coming from, but switching back to v0.3.2 and deleting
Manifest.toml
resolved my issue. Thanks for your help!
Getting a clean new Manifest.toml
is always a good practice. There could have been some updates on any listed packages in the Project.toml
and their dependencies from the last time Manifest.toml
was generated so by removing it and instantiating we let Julia Pkg
manager sort all other dependencies accordingly.
Is it ok to close the issue then?
Hello,
Thanks for your work on the package!
I noticed a bug when trying to run it with my instance:
crashes at the end of the fitting, with the following message:
My variable
y
is a categorical array of typeCategoricalArray{String15,1,UInt32}
and my variableX
is a DataFrame withFloat64
values - I am not sure where those Float32 come from. Anyway, maybe tell!() should accept (or convert) non-Float64 values?Thanks, and have a nice day!