JuliaAI / MLJ

Work in Progress
MIT License
1 stars 0 forks source link

Content suggestion for features #14

Closed ablaom closed 1 month ago

ablaom commented 3 months ago

Since I had some trouble adding markdown content, I'm including this here for now:


Matching models to tasks

A Model Registry stores detailed metadata for over 200 models and documentation can be searched without loading model code.

julia> X, y = @load_iris
julia> models(matching(X, y))
54-element Vector
 (name = AdaBoostClassifier, package_name = MLJScikitLearnInterface, ... )
 (name = AdaBoostStumpClassifier, package_name = DecisionTree, ... )
 (name = BaggingClassifier, package_name = MLJScikitLearnInterface, ... )
 ⋮

julia> models("pca")
 (name = PCA, package_name = MultivariateStats, ... )
 (name = PCADetector, package_name = OutlierDetectionPython, ... )

Tuning is a wrapper

For improved composability, and to mitigate data hygiene issues, an extensive number of meta-algorithms are implemented as model wrappers:

In this way, a model wrapped in a tuning strategy, for example, becomes a "self-tuning" model, with all data resampling (e.g., cross-validation) managed under the hood.

model = XGBoostRegressor()
r1 = range(model, :max_depth, lower=3, upper=10)
r2 = range(model, :gamma, lower=0, upper=10, scale=:log)
tuned_model = TunedModel(model, range=[r1, r2], resampling=CV(), measure=l2)

# optimise and retrain on all data:
mach = machine(tuned_model, data) |> fit!

predict(mach, Xnew)      # prediction using optimized params
report(mach).best_model  # inspect optimisation results

Tunable nested parameters

Creating pipelines, or wrapping models in meta-algorithms, such as iteration control, creates nested hyper-parameters. Such parameters can be optimized like any other.

julia> pipe = ContinuousEncoder() |> RidgeRegressor()
DeterministicPipeline(
  continuous_encoder = ContinuousEncoder(
        drop_last = false,
        one_hot_ordered_factors = false),
  ridge_regressor = RidgeRegressor(
        lambda = 1.0,
        fit_intercept = true,
        penalize_intercept = false,
        scale_penalty_with_samples = true,
        solver = nothing),
  cache = true)

julia> r = range(pipe, :(ridge_regressor.lambda), lower=0.001, upper=10.0)
julia> tuned_model = TunedModel(pipe, range=r, resampling=CV(), measure=l2)

Smart pipelines

Conventional model pipelines are available out-of-the box. Hyper-parameters of different model components can be simultaneously tuned, but only necessary components are retrained in each pipeline evaluation. Training reports expose reports for individual components, and the same holds for learned parameters.

pipe = OneHotEncoder() |> PCA(maxout=3) |> DecisionTreeClassifier()
mach = machine(pipe, X, y) |> fit!

# get actual PCA reduction dimension:
report(mach).pca.outdim

# get the tree:
fitted_params(mach).decision_tree_classifier.tree

Iteration control

MLJ provides a rich supply of iterative model "controls", such as early stopping criteria, snapshots, and callbacks for visualization. Any model with an iteration parameter can be wrapped in such controls, the iteration parameter becoming an additional learned parameter.

model = EvoTreeRegressor()
controls = [Step(1), Patience(5), TimeLimit(1/60), InvalidValue()]

iterated_model = IteratedModel(
    model;
    controls,
    measure=l2,
    resampling=Holdout(),
    retrain=true,
)

# train on holdout to find `nrounds` and retrain on all data:
mach = machine(iterated_mode, X, y) |> fit!
predict(mach, Xnew) # predict on new data

Composition beyond pipelines

In principle, any MLJ workflow is readily transformed into a lazily executed learning network.

For example, in the code block opposite, fit! triggers training of both models in parallel. Mutate a hyper-parameter of model1, call fit! again, and only model1's learned parameters are updated.

Learning networks can be exported as new stand-alone model types. MLJ's pipelines and stacks are actually implemented using learning networks.

X, y = source.(X, y) # wrap data in "source nodes"

# a normal MLJ workflow, with training omitted:
mach1 = machine(model1, X, y)
mach2 = machine(model2, X, y)
y1 = predict(mach1, X) # a callable "node"
y2 = predict(mach2, X)

y = 0.5*(y1 + y2)
fit!(y, acceleration=CPUThreads())

y(Xnew) # blended prediction on new data
EssamWisam commented 1 month ago

Closed since #13