JuliaAI / MLJTuning.jl

Hyperparameter optimization algorithms for use in the MLJ machine learning framework
MIT License
66 stars 12 forks source link
grid-search hyperparameter-optimization julia machine-learning mlj random-search

MLJTuning

Hyperparameter optimization for MLJ machine learning models.

See Tuning Models · MLJ for usage examples.

Build Status codecov.io

Contents

Note: This component of the MLJ stack applies to MLJ versions 0.8.0 and higher. Prior to 0.8.0, tuning algorithms resided in MLJ.

Who is this repo for?

This repository is not intended to be directly imported by the general MLJ user. Rather, MLJTuning is a dependency of the MLJ machine learning platform, which allows MLJ users to perform a variety of hyperparameter optimization tasks from there.

MLJTuning is the place for developers to integrate hyperparameter optimization algorithms (here called tuning strategies) into MLJ, either by adding code to /src/strategies, or by importing MLJTuning into a third-party package and implementing MLJTuning's tuning strategy interface.

MLJTuning is a component of the MLJ stack which does not have MLJModels as a dependency (no ability to search and load registered MLJ models). It does however depend on MLJBase and, in particular, on the resampling functionality currently residing there.

What is provided here?

This repository contains:

How do I implement a new tuning strategy?

This document assumes familiarity with the Evaluating Model Performance and Performance Measures sections of the MLJ manual. Tuning itself, from the user's perspective, is described in Tuning Models.

Overview

What follows is an overview of tuning in MLJ. After the overview is an elaboration on those terms given in italics.

All tuning in MLJ is conceptualized as an iterative procedure, each iteration corresponding to a performance evaluation of a single model. Each such model is a mutated clone of a fixed prototype. In the general case, this prototype is a composite model, i.e., a model with other models as hyperparameters, and while the type of the prototype mutations is fixed, the types of the sub-models are allowed to vary.

When all iterations of the algorithm are complete, the optimal model is selected by applying a selection heuristic to a history generated according to the specified tuning strategy. Iterations are generally performed in batches, which are evaluated in parallel (sequential tuning strategies degenerating into semi-sequential strategies, unless the batch size is one). At the beginning of each batch, both the history and an internal state object are consulted, and, on the basis of the tuning strategy, a new batch of models to be evaluated is generated. On the basis of these evaluations, and the strategy, the history and internal state are updated.

The tuning algorithm initializes the state object before iterations begin, on the basis of the specific strategy and a user-specified range object.

Interface points for user input

Recall, for context, that in MLJ tuning is implemented as a model wrapper. A model is tuned by fitting the wrapped model to data (which also trains the optimal model on all available data). This process determines the optimal model, as defined by the selection heuristic (see above). To use the optimal model one predicts using the wrapped model. For more detail, see the Tuning Models section of the MLJ manual.

In setting up a tuning task, the user constructs an instance of the TunedModel wrapper type, which has these principal fields:

Implementation requirements for new tuning strategies

As sample implementations, see /src/strategies/

Summary of functions

Several functions are part of the tuning strategy API:

Important note on the history. The initialization and update of the history is carried out internally, i.e., is not the responsibility of the tuning strategy implementation. The history is always initialized to nothing, rather than an empty vector.

The above functions are discussed further below, after discussing types.

The tuning strategy type

Each tuning algorithm must define a subtype of TuningStrategy whose fields are the hyperparameters controlling the strategy that do not directly refer to models or model hyperparameters. These would include, for example, the default resolution of a grid search, or the initial temperature in simulated annealing.

The algorithm implementation must include a keyword constructor with defaults. Here's an example:

mutable struct Grid <: TuningStrategy
    goal::Union{Nothing,Int}
    resolution::Int
    shuffle::Bool
    rng::Random.AbstractRNG
end

# Constructor with keywords
Grid(; goal=nothing, resolution=10, shuffle=true,
     rng=Random.GLOBAL_RNG) =
    Grid(goal, resolution, MLJBase.shuffle_and_rng(shuffle, rng)...)

Range types

Generally new types are defined for each class of range object a tuning strategy should like to handle, and the tuning strategy functions to be implemented are dispatched on these types. It is recommended that every tuning strategy support at least these types:

Recall that ParamRange has two concrete subtypes NumericRange and NominalRange, whose instances are constructed with the MLJBase extension to the range function.

Note in particular that a NominalRange has a values field, while NumericRange has the fields upper, lower, scale, unit and origin. The unit field specifies a preferred length scale, while origin a preferred "central value". These default to (upper - lower)/2 and (upper + lower)/2, respectively, in the bounded case (neither upper = Inf nor lower = -Inf). The fields origin and unit are used in generating grids or fitting probability distributions to unbounded ranges.

A ParamRange object is always associated with the name of a hyperparameter (a field of the prototype in the context of tuning) which is recorded in its field attribute, a Symbol, but for composite models this might be a be an Expr, such as :(atom.max_depth).

Use the iterator and sampler methods to convert ranges into one-dimensional grids or for random sampling, respectively. See the tuning section of the MLJ manual or doc-strings for more on these methods and the Grid and RandomSearch implementations.

The clean! method: For validating tuning strategy

MLJTuning.clean!(tuning::MyTuningStrategy)

Some tuning strategies have mutable fields that only support specific set of values: a particle swarm strategy, for instance, should have at least three agents for the algorithm to work. As such, it is recommended to implement the clean! method to warn the user and correct invalid tuning hyperparameters. The method should return a string message if some fields have been reset or an empty string otherwise, and will be called internally whenever a TunedModel machine is fit!.

The default fallback for clean! returns an empty string.

The setup method: To initialize state

state = setup(tuning::MyTuningStrategy, model, range, n, verbosity)

The setup function is for initializing the state of the tuning algorithm (available to the models method). Be sure to make this object mutable if it needs to be updated by the models method. The arguments model and n are what the user has specified in their TunedModel instance; recall model is the prototype to be cloned and mutated, and n the total number of mutations to be generated.

The state is a place to record the outcomes of any necessary intialization of the tuning algorithm (performed by setup) and a place for the models method to save and read transient information that does not need to be recorded in the history.

The setup function is called once only, when a TunedModel machine is fit! the first time, and not on subsequent calls (unless force=true). (Specifically, MLJBase.fit(::TunedModel, ...) calls setup but MLJBase.update(::TunedModel, ...) does not.)

The verbosity is an integer indicating the level of logging: 0 means logging should be restricted to warnings, -1, means completely silent.

The fallback for setup is:

setup(tuning::TuningStrategy, model, range, n, verbosity) = range

However, a tuning strategy will generally want to implement a setup method for each range type it is going to support:

MLJTuning.setup(tuning::MyTuningStrategy, model, range::RangeType1, n, verbosity) = ...
MLJTuning.setup(tuning::MyTuningStrategy, model, range::RangeType2, n, verbosity) = ...
etc.

The extras method: For adding user-inspectable data to the history

MLJTuning.extras(tuning::MyTuningStrategy, history, state, E) -> named_tuple

This method should return any user-inspectable information to be included in a new history entry, that is in addition to the model, measures, measurement and per_fold data. This method must return a named tuple, human readable if possible. Each key of the returned named tuple becomes a key of the new history entry.

Here E is the full evalutation object for model and history the current history (before adding the new entry).

The fallback for extras returns an empty named tuple.

The models method: For generating model batches to evaluate

MLJTuning.models(tuning::MyTuningStrategy, model, history, state, n_remaining, verbosity)
    -> vector_of_models, new_state

This is the core method of a new implementation. Given the existing history and state, it must return a vector ("batch") of new model instances vector_of_models to be evaluated, and the updated state. Any number of models may be returned (and this includes an empty vector or nothing) and the evaluations will be performed in parallel (using the mode of parallelization defined by the acceleration field of the TunedModel instance).

Important note. Parallelization means the order in which the history gets extended after models(...) returns its list of new candidates is generally not the same order in which the candidates are returned. Some implementations may therefore need to attach extra "labeling" metadata to each model, as explained below, so that the existing history can be suitably interpreted.

If more models are returned than needed (because including them would create a history whose length exceeds the user-specified number of iterations tuned_model.n) then the surplus models are saved, for use in a "warm restart" of tuning, when the user increases tuned_model.n. The remaining models are then evaluated and these evaluations are added to the history. In any warm restart, no new call to models will be made until all saved models have been evaluated, and these evaluations added to the history.

If the tuning algorithm exhausts it's supply of new models (because, for example, there is only a finite supply, as in a Grid search) then vector_of_models should be an empty vector or nothing. The interface has no fixed "batch-size" parameter, and the tuning algorithm is happy to receive any number of models; a surplus is handled as explained above, a shortfall will trigger an early stop (so that the final history has length less than tuned_model.n).

If needed, extra metadata may be attached to each model returned; see below.

Sequential tuning strategies generating models non-deterministically (e.g., simulated annealing) might choose to include a batch size hyperparameter, and arrange that models returns batches of the specified size (to be evaluated in parallel when acceleration is set appropriately). However, the evaluations and history updates do not occur until after the models call, so it may be complicated or impossible to preserve the original (strictly) sequential algorithm in that case, which should be clearly documented.

Some simple tuning strategies, such as RandomSearch, will want to return as many models as possible in one hit. To this end, the variable n_remaining is passed to the models call; this is the difference between the current length of the history and tuned_model.n.

Including model metadata

If a tuning strategy implementation needs to record additional metadata in the history, for each model generated, then instead of model instances, vector_of_models should be vector of tuples of the form (m, metadata), where m is a model instance, and metadata the associated data. To access the metadata for the jth element of the existing history, use history[j].metadata.

The tuning_report method: To add to the user-inspectable report

As with any model, fitting a TunedModel instance generates a user-accessible report. Note that the fallback report already includes additions to the history created by the extras method mentioned above. To add more strategy-specific information to the report, one overloads tuning_report.

Specically, the report generated by fitting a TunedModel is constructed with this code:

report1 = (best_model         = best_model,
           best_history_entry = best_user_history_entry,
           history            = user_history,
           best_report        = best_report)

report = merge(report1, tuning_report(tuning, history, state))

where:

The fallback for tuning_report returns an empty named-tuple.

The default_n method: For declaring the default number of iterations

MLJTuning.default_n(tuning::MyTuningStrategy, range)

The models method, which is allowed to return multiple models in it's first return value vector_of_models, is called until one of the following occurs:

The fallback is

default_n(tuning::TuningStrategy, range) = DEFAULT_N

where DEFAULT_N is a global constant. Do using MLJTuning; MLJTuning.DEFAULT_N to see check the current value.

The supports_heuristic trait

If you define a selection heuristic SpecialHeuristic (see below) and that heuristic is specific to a tuning strategy TuningStrategy then you must define

MLJTuning.supports_heuristic(::TuningStrategy, ::SpecialHeuristic) = true

Sample implementations

A number of built-in tuning strategy implementations of the MLJTuning API can be found at /src/strategies.

The simplest Explicit strategy is the simplest but is also an odd case as the range is just an iterator of MLJ models. These models need not share a common type and model is never cloned.

For slightly less trivial example, see the Grid search code

How do I implement a new selection heuristic?

Recall that a selection heuristic is a rule which decides on the "best model" given the model evaluations in the tuning history. New heuristics are introduced by defining a new struct SomeHeuristic subtyping SelectionHeuristic and implementing a method

MLJTuning.best(heuristic::SomeHeuristic, history) -> history_entry

where history_entry is the entry in the history corresponding to the model deemed "best".

Below is a simplified version of code defining the default heuristic NaiveSelection() which simply chooses the model with the lowest (or highest) aggregated performance estimate, based on the first measure specified by the user in his TunedModel construction (she may specify more than one).

struct NaiveSelection <: MLJTuning.SelectionHeuristic end

function best(heuristic::NaiveSelection, history)
    measurements = [h.measurement[1] for h in history]
    measure = first(history).measure[1]
    if orientation(measure) == :score
        measurements = -measurements
    end
    best_index = argmin(measurements)
    return history[best_index]
end

Because this selection heuristic is generic (applies to all tuning strategies) we additionally define

MLJTuning.supports_heuristic(strategy, heuristic::NaiveSelection) = true

For strategy-specific selection heuristics, see above on how to set this trait.