Closed liuyxpp closed 2 weeks ago
This is very likely fixed if you have the very latest version of MLJFlow.jl (0.4.1).
Okay, maybe not; you're not using MLJ. I'll take a look soon.
julia> using CatBoost.MLJCatBoostInterface
julia> using DataFrames
julia> using MLJBase
# Initialize data
julia> train_data = DataFrame([[1, 4, 30], [4, 5, 40], [5, 6, 50], [6, 7, 60]], :auto)
3×4 DataFrame
Row │ x1 x2 x3 x4
│ Int64 Int64 Int64 Int64
─────┼────────────────────────────
1 │ 1 4 5 6
2 │ 4 5 6 7
3 │ 30 40 50 60
julia> train_labels = [10.0, 20.0, 30.0]
3-element Vector{Float64}:
10.0
20.0
30.0
julia> eval_data = DataFrame([[2, 1], [4, 4], [6, 50], [8, 60]], :auto)
# Initialize CatBoostClassifier
2×4 DataFrame
Row │ x1 x2 x3 x4
│ Int64 Int64 Int64 Int64
─────┼────────────────────────────
1 │ 2 4 6 8
2 │ 1 4 50 60
julia> model = CatBoostRegressor(; iterations=2, learning_rate=1.0, depth=2)
CatBoostRegressor(
iterations = 2,
learning_rate = 1.0,
depth = 2,
l2_leaf_reg = 3.0,
model_size_reg = 0.5,
rsm = 1.0,
loss_function = "RMSE",
border_count = nothing,
feature_border_type = nothing,
per_float_feature_quantization = nothing,
input_borders = nothing,
output_borders = nothing,
fold_permutation_block = 1,
nan_mode = "Min",
counter_calc_method = "SkipTest",
leaf_estimation_iterations = nothing,
leaf_estimation_method = nothing,
thread_count = -1,
random_seed = nothing,
metric_period = 1,
ctr_leaf_count_limit = nothing,
store_all_simple_ctr = false,
max_ctr_complexity = nothing,
has_time = false,
allow_const_label = false,
target_border = nothing,
one_hot_max_size = nothing,
random_strength = 1.0,
custom_metric = nothing,
bagging_temperature = 1.0,
fold_len_multiplier = 2.0,
used_ram_limit = nothing,
gpu_ram_part = 0.95,
pinned_memory_size = 1073741824,
allow_writing_files = nothing,
approx_on_full_history = false,
boosting_type = nothing,
simple_ctr = nothing,
combinations_ctr = nothing,
per_feature_ctr = nothing,
ctr_target_border_count = nothing,
task_type = nothing,
devices = nothing,
bootstrap_type = nothing,
subsample = nothing,
sampling_frequency = "PerTreeLevel",
sampling_unit = "Object",
gpu_cat_features_storage = "GpuRam",
data_partition = nothing,
early_stopping_rounds = nothing,
grow_policy = "SymmetricTree",
min_data_in_leaf = 1,
max_leaves = 31,
leaf_estimation_backtracking = "AnyImprovement",
feature_weights = nothing,
penalties_coefficient = 1.0,
model_shrink_rate = nothing,
model_shrink_mode = "Constant",
langevin = false,
diffusion_temperature = 10000.0,
posterior_sampling = false,
boost_from_average = nothing,
text_processing = nothing)
julia> mach = machine(model, train_data, train_labels)
# Fit model
untrained Machine; caches model-specific representations of data
model: CatBoostRegressor(iterations = 2, …)
args:
1: Source @487 ⏎ Table{AbstractVector{Count}}
2: Source @087 ⏎ AbstractVector{Continuous}
julia> MLJBase.fit!(mach)
# Get predictions
[ Info: Training machine(CatBoostRegressor(iterations = 2, …), …).
trained Machine; caches model-specific representations of data
model: CatBoostRegressor(iterations = 2, …)
args:
1: Source @487 ⏎ Table{AbstractVector{Count}}
2: Source @087 ⏎ AbstractVector{Continuous}
julia> preds_class = MLJBase.predict(mach, eval_data)
2-element Vector{Float64}:
15.625
18.125
julia> serializable_fitresult = MLJBase.save(mach, mach.fitresult)
Python: <catboost.core.CatBoostRegressor object at 0x7e27fff9e1e0>
julia> restored_fitresult = MLJBase.restore(mach, serializable_fitresult)
Python: <catboost.core.CatBoostRegressor object at 0x7e27fff9e1e0>
The MMI.save
and MMI.restore
functions work with the Machine's fitresult. I can look at adding support for serializing the entire Machine. It was set up like this because we have to use the catboost's Python interface to save/load the models.
In case it's relevant: https://juliaai.github.io/MLJModelInterface.jl/dev/serialization/
Julia: v1.10.2 CatBoost: v0.3.4
MWE:
The last line failed with:
It seems the error is due to the following line in
mlj_serialization.jl