Sharing Hyper Parameters

JuliaAI / MLJ.jl

A Julia machine learning framework

https://juliaai.github.io/MLJ.jl/

Other

1.75k stars 157 forks source link

Sharing Hyper Parameters #628

Open leonardtschora opened 3 years ago

leonardtschora commented 3 years ago

Hi everyone, as usual, thanks for helping me out on those issues!

I have 2 models that are NOT of the same type but they share common hyperparameters (not all of them, the set of shared hyperparameters is a subset of both sets of hyperparameters of both models).

As I understand it from Learning Networks

different machines can wrap a common model (allowing for hyperparameters in different machines to be coupled)

It is indeed possible to write

mutable struct A <: Unsupervised
    hp1::Int
    hp2::Int
end

Xs = source(rand(100))

a = A(10, 10)
ma1 = machine(a, Xs)
ma2 = machine(a, Xs)

And both ma1 and ma2 will have to be retrained if one modifies hp1 or hp2, enabling an easy hyper-parameter search.

However, my case is slightly more complex:

mutable struct B <: Unsupervised
    hp1::Int # Shared Hyper-Parameter
    hp3::Int
end

Note : It does not really matters what A and B are in terms of type of Models, they can be Unsupervised, Supervised, etc... The idea is that they are both used sequentially to produce an output, and I want to find the optimal set of hyper-parameters for this learning network.

Here is a summary of the situation:

I migth have missed something to make this work easily, I am still new to MLJ. If it is currently not possible, my suggestion would be to allow to pass a list of hyperparameters to the range function. The idea would be to give the same values for different specified hyperparameters during tuning. The solution would then be:

range(xout, [:(a.hp1), :(b.hp2)], ....)

If I did not make myself clear enough just let me know I'll try to explain iot better.

Thanks a lot for your help and support.

leonardtschora commented 3 years ago

So here is a workaround to solve this problem: I wrapped the models that need to share hyper-parameters into a composite model that also has the model's hyper-parameters. Then I redefined the update method of this wrapper so that everytime that the wrapper's hyper-parameter change, the sub models are also refitted.

Here is the Sub model:

mutable struct Sub <: Unsupervised
    hp1::Int
end

function MLJ.fit(s::Sub, verbosity::Int, X)
    fitresult = s.hp1
    return fitresult, nothing, nothing
end

function MLJ.transform(s::Sub, fitresult, X)
    return fitresult * X
end

And there is the wrapper

mutable struct C <: UnsupervisedComposite
    hp1::Int
end

function MLJ.fit(c::C, verbosity::Int, X)
    Xs= source(X)

    s1 = Sub(c.hp1)    
    m1 = machine(s1, Xs)       
    out1 = transform(m1, Xs)

    s2 = Sub(c.hp1)
    m2 = machine(s2, Xs)       
    out2 = transform(m2, Xs)    

    out = 0.5 * (out1 + out2)

    fit!(out)
    fitresult = out
    cache = (s1=s1, s2=s2)
    return fitresult, cache, nothing
end

function MLJ.transform(model::C, fitresult::Node, X)
    return fitresult(X)
end

function MLJ.update(c::C, verbosity::Int, old_fitresult::Node, old_cache, X)
    # c's hp1 has changed, need to update the submodel's hp1.
    old_cache.s1.hp1 = c.hp1
    old_cache.s2.hp1 = c.hp1

    out = old_fitresult
    fit!(out)

    fitresult = out
    cache = old_cache
    return fitresult, cache, nothing
end

Then, we can run the following piece of code:

c = C(2)
m = machine(c, X)
fit!(m)
transform(m, X)

c.hp1 = 3
fit!(m)

The last call to fit!(m) gives

[ Info: Updating Machine{C4} @324. [ Info: Updating Machine{Sub} @269. [ Info: Updating Machine{Sub} @649. Machine{C4} @324 trained 2 times. args: 1: Source @955 ⏎ AbstractArray{ScientificTypes.Continuous,2}

Note that if we don't redefined update, then while calling fit! with a changed model, then the 2 submodels are not retrained.

Let me know what you think about this (I still think there should be other simpler ways to solve this problem, such as having a "double range" that force 2 hyper-parameters of the same model to have the same value, or maybe declaring submodels instantiation as nodes so that MLJ knows they have to be retrained because hyper-parameters have changed?) Thanks.

OkonSamuel commented 3 years ago

@Leonardbcm . You have pretty much figured this out. I just have a the following addition to make.

In MLJ, Composite models have a fallback update method for user defined composite models that don't implement a update method. This fallback update method (which works great in most cases) either rebuilds the entire network or updates it depending on the following heuristic. see https://github.com/alan-turing-institute/MLJBase.jl/blob/master/src/composition/models/methods.jl#L49

 # If any `model` field has been replaced (and not just mutated)
    # then we actually need to fit rather than update (which will
    # force build of a new learning network).

In your example above the hp1 parameter of the learning network isn't a MLJ.Model instance but rather an Integer. so changing that parameter won't cause fitting the entire network as there are no model::MLJ.Model fields to start-with, talk-less of checking whether it is being replaced or mutated. One way to force refitting of the submodels is to write your update function as you have already done above another approach (not better in any way) is shown below.

mutable struct B <: MLJ.Model
  hp1::Int
end

mutable struct Sub <: Unsupervised
    hp1::Int
end

function MLJ.fit(s::Sub, verbosity::Int, X)
    fitresult = s.hp1
    return fitresult, nothing, nothing
end

function MLJ.transform(s::Sub, fitresult, X)
    return fitresult * MLJ.matrix(X)
end

Xs= source(X)
hp1 = B(2)
s1 = Sub(hp1.hp1)    
m1 = machine(s1, Xs)       
out1 = transform(m1, Xs)

s2 = Sub(hp1.hp1)
m2 = machine(s2, Xs)       
out2 = transform(m2, Xs)    

out = 0.5 * (out1 + out2)

mach = machine(Unsupervised(), Xs; transform=out)
fit!(mach, verbosity=1)

@from_network mach begin
    mutable struct D
       hp1::B = hp1
    end
end

julia> X = table(rand(100,5));

julia> d = D(B(2))
D(
    hp1 = B(
            hp1 = 2)) @443

julia> m =machine(d, X)
Machine{D} @724 trained 0 times.
  args: 
    1:  Source @412 ⏎ `Table{AbstractArray{Continuous,1}}`

julia> fit!(m)
[ Info: Training Machine{D} @724.
[ Info: Training Machine{Sub} @376.
[ Info: Training Machine{Sub} @703.
Machine{D} @724 trained 1 time.
  args: 
    1:  Source @412 ⏎ `Table{AbstractArray{Continuous,1}}`

julia> transform(m, X);

julia> d.hp1 = B(3)
B(
    hp1 = 3) @901

julia> fit!(m)
[ Info: Updating Machine{D} @724.
[ Info: Training Machine{Sub} @746.
[ Info: Training Machine{Sub} @155.
Machine{D} @724 trained 2 times.
  args: 
    1:  Source @412 ⏎ `Table{AbstractArray{Continuous,1}}`

@ablaom is there any thing am missing out ??

leonardtschora commented 3 years ago

Thanks a lot for this detailed answer @OkonSamuel . If I understand, your idea, is to wrap hyper-parameters inside a model, so that the default update function for composite models understands that something has changed in the Sub-Models and trigger their refitting. This seems a lot more elegant than re-implementing the update function (which can turn into copy-pasting of lines from the fit function and causes logic duplication which is quite bad for maintenance). However, in your example, it seems that the Sub-Models are not retrained : new machines are created and they are fitted from scratch (In your second call to fit!(m), we can see that the machine numbers have changed, and the verbosity tells

[ Info: Training Machine{Sub} @746.

instead of

[ Info: Updating Machine{Sub} @746.

Which is what I'd like to obtain. This is still quite a step forward, let me know if there could be a quick way to solve this, I'll try things on my own.

Thanks for your support.

OkonSamuel commented 3 years ago

However, in your example, it seems that the Sub-Models are not retrained : new machines are created and they are fitted from scratch (In your second call to fit!(m)

Yeah. Your right it creates a new learning network. (It's a consequence of the default update method)

leonardtschora commented 3 years ago

So the only option is to re-implement an update method? I have read the implementation of the default update method, and particularly

If any model field has been replaced (and not just mutated) then we actually need to fit rather than update (which will force build of a new learning network).

seems to indicate that we should just mutate the Hyper-Parameter container:

 d.hp1.hp1 = 3
fit!(m)

However, this also causes the reconstruction of a new learning network.

OkonSamuel commented 3 years ago

So the only option is to re-implement an update method?

Yes

OkonSamuel commented 3 years ago

However, this also causes the reconstruction of a new learning network.

This is a consequence of the update fall back method particularly this line if !issubset(submodel_ids, network_model_ids)

leonardtschora commented 3 years ago

Yes I was executing the ùpdate` function line by line, and it seems that in our case,

!issubset(submodel_ids, network_model_ids)

returns true (thus refitting everything) because the submodels are not attributes of the Top Model D.

the code:

 models(glb_node)
network_model_ids = objectid.(models(glb_node))

submodels    = filter(f->f isa Model, field_values)
submodel_ids = objectid.(submodels)

preceding if !issubset(submodel_ids, network_model_ids) shows that the model does not "understand" that the B object wrapping the hyper-parameters is part of the fitting process : it does not appears in the glb of the fitresult.

leonardtschora commented 3 years ago

Ok I just realized something realy stupid that changes everything: if two different models have the same object as hyper-parameter, then they will be automaticly synchronised.

Moreover, the other problem we had is that

the model does not "understand" that the B object wrapping the hyper-parameters is part of the fitting process

This problem can be avoided by simply supressing the B object from the top model. This way, the set of models hyper-parameters of the top model can be a subset of the nodes models, thus not triggering refitting from scratch.

The synchronization of hyper-parameters in the submodels will be handled automaticly because they will both hold a reference to the same mutable object and not a hard-coded value.

Here is the solution I implemented below:

A dummy mutable structure to hold a shared hyper-parameter

mutable struct HyperParameterWrapper
    hyper_param::Any
end

Two Sub-Models that need to share a common hyper-parameter hp1. I also gave them an independant hyper-parameter (respectively hp2 and hp3) to show how this solution can cover cases that having 2 machines pointing towards the same model can't (see my first post on this issue).

mutable struct SubModel1 <: Unsupervised
    hp1::HyperParameterWrapper
    hp2::Int
end

function MLJ.fit(s::SubModel1, verbosity::Int, X)
    fitresult = s.hp2 / s.hp1.hyper_param
    return fitresult, nothing, nothing
end

function MLJ.transform(s::SubModel1, fitresult, X)
    return fitresult * X
end

mutable struct SubModel2 <: Unsupervised
    hp1::HyperParameterWrapper
    hp3::Int
end

function MLJ.fit(s::SubModel2, verbosity::Int, X)
    fitresult = s.hp3 / s.hp1.hyper_param
    return fitresult, nothing, nothing
end

function MLJ.transform(s::SubModel2, fitresult, X)
    return fitresult * X
end

Now a little testing before going further:

X = rand(100, 4)
Xs = source(X)

# Try the Submodels
s = SubModel1(HyperParameterWrapper(42), 41)
m = machine(s, Xs)
fit!(m)
transform(m, X)

s = SubModel2(HyperParameterWrapper(42), 43)
m = machine(s, Xs)
fit!(m)
transform(m, X)

hp = HyperParameterWrapper(42)
s1 = SubModel1(hp, 41)
s2 = SubModel2(hp, 41)

m1 = machine(s1, Xs)
fit!(m1)

m2 = machine(s2, Xs)
fit!(m2)

s1.hp1.hyper_param = 41
fit!(m2) # A change in s1 is repercuted in s2!
s1.hp1 == s2.hp1

The hyper-parameter synchronisation is handled, let's wrap everything in a top model:

mutable struct ACompositeModel <: UnsupervisedComposite
    hp::Int # An Hyper-parameter that has nothing to do with the submodels
    s1::SubModel1
    s2::SubModel2
end

function MLJ.fit(model::ACompositeModel, verbosity::Int, X)
    Xs= source(X)

    # Make sure that hyper-parameters of the submodels are synchronized in case
    # of refit from scratch or bad initialization
    model.s2.hp1 = model.s1.hp1

    # Create SubModel's machines and add the transformation operation
    m1 = machine(model.s1, Xs)
    out1 = transform(m1, Xs)    
    m2 = machine(model.s2, Xs)
    out2 = transform(m2, Xs)

    # Sum both outputs
    out = out1 + out2

    # Using the Submodel-independant hyperparameter
    out = model.hp * out

    mach = machine(Unsupervised(), Xs; transform=out)
    fit!(mach)

    return mach()
end

Note that for safety, I added the line model.s2.hp1 = model.s1.hp1 to make sure that the submodel's shared hyper-parameters are pointing towards the same object.

Try the composite model:

comp = ACompositeModel(1, SubModel1(HyperParameterWrapper(42), 41),
                        SubModel2(HyperParameterWrapper(43), 43))
mach = machine(comp, Xs)
fit!(mach)
transform(mach, X);

comp.s1.hp1 == comp.s1.hp1
comp.s1.hp1 === comp.s1.hp1

The hyper-parameter is well shared, even if the composite model has been badly initialized! And now the most important part:

# Mutate the shared hyper-parameter
comp.s1.hp1.hyper_param = 1
fit!(mach)
transform(mach, X);

Yeah, only updating, no refitting from scratch!

Let's change other hyper-parameters:

# Change the Submodel-independant hyper-parameter
comp.hp = 2
fit!(mach)
transform(mach, X);

# Change the Submodels, not shared, hyper-parameters
comp.s1.hp2 = 2
fit!(mach)
transform(mach, X);

comp.s2.hp3 = 3
fit!(mach)
transform(mach, X);

You can see that both Sub-Models are only updating and are not retrained from scratch. Finaly, while this solution theoreticaly does what I want, I have not yet tryed it on my actual models, only on those toys problems. Moreover, it can easily be broken:

comp.s1.hp1 = HyperParameterWrapper(42)
fit!(mach)
comp.s1
comp.s2

Here, we replace the shared hyper-parameter on s1's side. Since the top-model had no sub-model replaced, only mutated, update does its job by not refitting the top model, thus not making sure that both submodels point to the same hyper-parameter.

Let me know what you think of it, I will keep you in touch on whether or not it works on "real" problems.

OkonSamuel commented 3 years ago

Let me know what you think of it,

At some point i thought about this. Yeah this also works but has the limitation that both SubModel1 and SubModel2 has to be defined by you (i.e you can't make use of existing models) If say you have to Regressors, Regressor1 and Regressor2 from 2 different packages sharing the same hyper-parameter Iambda then this won't work because it requires lambda to be of type HyperParameterWrapper rather than Float64 which most implementation do. I hope you get what i mean. Provided that SubModel1 and SubModel2 are implemented by you then it works great.

leonardtschora commented 3 years ago

That's correct. This is ok for my use-case though as I needing this specific feature for a particular custom model, It will indeed not work for already implemented models.

I still thinks the best way to tackle the problem is to implement ranges that modify several parameters with the same value. This way you don't care about whether or not the model has been implemented by you or not, and you don't need to bother wrapping your hyper-parameters. You could even "link" hyper-parameters with different names, etc...

I have no idea if this would be hard to do. I might try and have a look at it but from what I've seen, this implies recoding a new tuning strategy and it seems complicated.

Anyway, thanks for the feedback!

leonardtschora commented 3 years ago

Due to how flexible the code was implemented, it was not that hard to figure this out!

Just to sum up before begining, the idea is always to "tie" hyper-parameters together so that when one is changed, the other changes too. And of course, the difficulty arise while one wants to tune a model: you certainly do not want to refit your model from scratch but rather to take advantage of the updating workflow proposed by MLJ.

While our previous discussion have led to a solution (wrapping hyper-parameters so that they are programmaticly binded), the following will take a completly different point of view and will create a "binded range". In short, this is just a way of telling a Tuning strategy (Grid or other) that a range of value has to be applied to several hyper-parameter in the model, nested or not.

The general idea is being able to create a range:

range(model, [:hp1, :hp2], ....)

that, when used in a Tuning Strategy, will update both model.hp1and model.hp2 with the same values defined by the range.

Back to the code: most of the things to change are that the field attribute of the NumericalRange and NominalRange or argument of the range functions. This attribute/argument needs to be able to be a Vector{Union{Symbol, Expr}} if specified.

In MLJBase.jl/src/hyperparam/one_dimensional_ranges.jl, we need to change the attribute of numerical and nominal ranges objects:

struct NumericRange{T,B<:Boundedness,D} <: ParamRange{T}
    # Need to accept also vectors of Symbol or Expr
    field::Union{Symbol,Expr, Vector{Union{Symbol,Expr}}}
    lower::Union{T,Float64}     # Float64 to allow for -Inf
    upper::Union{T,Float64}     # Float64 to allow for Inf
    origin::Float64
    unit::Float64
    scale::D
end

struct NominalRange{T,N} <: ParamRange{T}
     # Need to accept also vectors of Symbol or Expr
    field::Union{Symbol,Expr, Vector{Union{Symbol,Expr}}}
    values::NTuple{N,T}
end

In the same file, we need to modify the range function

function Base.range(model::Union{Model, Type}, field::Union{Symbol,Expr, Vector{Union{Symbol,Expr}}};
                    values=nothing, lower=nothing, upper=nothing,
                    origin=nothing, unit=nothing, scale::D=nothing) where D
    if model isa Model
        # If field is a vector, this will call the appropriate recursive_getproperty method
        value = recursive_getproperty(model, field)

        # If "field" is a vector, then call eltype instead of typeof
        T = field isa Vector ? eltype(value) : typeof(value)
    else
        T = model
    end
    if T <: Real && values === nothing
        return numeric_range(T, D, field, lower, upper, origin, unit, scale)
    else
        return nominal_range(T, field, values)
    end
end

By the way in MLJBase.jl/src/utilities.jl , we need to specialize recursive_getproperty and recursive_setproperty! so that they work with a vector:

function recursive_getproperty(obj, fields::Vector{T}) where T<:Union{Symbol, Expr}    
    recursive_getproperty.(Ref(obj), fields)    
end

function recursive_setproperty!(obj, fields::Vector{T}, value) where T<:Union{Symbol, Expr}
    recursive_setproperty!.(Ref(obj), fields, Ref(value))
end

The idea, in the range function, is the following: If the field argument is a vector of the correct type, then the new recursive_getproperty is called and returns a vector of elements corresponding to the values of the specified fields in the model. The, we need to call eltype instead of typeof. eltype has the advantage to return a commun super-type if for instance there are floats and ints in the array, it will return real. eltype returns Any if there are no common supertypes. The variable T thus will be correctly isntancied for the following of the code.

Since the nominal_range and numerical_range function do not impose a type restriction on their argument field, then there is nothing to change.

The last step is a bit annoying, and we have to go in MLJTuning.jl/src/strategies/grid.jl The function fields_iterators_and_scales initialize Dictionaries with the type Union{Symbol, Expr} and we need to change it to accept vectors:

function MLJTuning.fields_iterators_and_scales(ranges, resolutions)

    # following could have non-unique entries:
    fields = map(r -> r.field, ranges)

    # The keys of those dictionaries can now be a symbol, an expr, or a vector of symbol or expr.
    iterator_given_field = Dict{Union{Symbol,Expr, Vector{Union{Symbol,Expr}}},Vector}()
    scale_given_field = Dict{Union{Symbol,Expr, Vector{Union{Symbol,Expr}}},Any}()
    for i in eachindex(ranges)
        fld = fields[i]
        r = ranges[i]
        if haskey(iterator_given_field, fld)
            iterator_given_field[fld] =
                vcat(iterator_given_field[fld], iterator(r, resolutions[i]))
            scale_given_field[fld] =
                _merge(scale_given_field[fld], scale(r))
        else
            iterator_given_field[fld] = iterator(r, resolutions[i])
            scale_given_field[fld] = scale(r)
        end
    end
    fields = unique(fields)
    iterators = map(fld->iterator_given_field[fld], fields)
    scales = map(fld->scale_given_field[fld], fields)

    return fields, iterators, scales
end

Here are some quick tests to check things are working in basic cases.

Test on ranges instanciation, recursive get and set properties

mutable struct A
    a::Int
    b::Int
end

mutable struct Dummy <: MLJBase.Model
    a::A
    b::Int
end

model = Dummy(A(1, 2), 3)

######### Check that ranges are not broken
r1 = Base.range(model, :b; lower=1, upper=30)
r1b = Base.range(Int, :b; lower=1, upper=30)
@assert r1 == r1b

As = [A(1, 1), A(3, 3)]
r2 = Base.range(model, :a; values=As)
r2b = Base.range(A, :a; values=As)
@assert r2 == r2b

######### Check binding ranges
fields = Vector{Union{Symbol,Expr}}([:b])
r3 = Base.range(model, fields; lower=1, upper=30)

fields = Vector{Union{Symbol,Expr}}([:b, :(a.b)])
r4 = Base.range(model, fields; lower=1, upper=30)

######## Try nominal ranges
mutable struct C
    a::String
    b::String
end

mutable struct NominalD <: MLJBase.Model
    a::Int
    c::C
end

model = NominalD(3, C("1", "2"))
fields = Vector{Union{Symbol,Expr}}([:a, :(c.a)])

# This should trhow an error : eltype(recursive_getproperty) will return Any
# Becasue recursive_getproperty will be composed of an Int and a String
r4 = Base.range(model, fields; lower=1, upper=30)

r5 = Base.range(model, fields; lower=1, upper=30)

And now a test using a TuningModel:

# Generate some dummy data
X = MLJ.table(rand(100, 10));
y = 2X.x1 - X.x2 + 0.05*rand(100);

# Pick a model
tree_model = @load DecisionTreeRegressor;

# For some reason, we want hour tree to have min_samples_leaf == max_depth
fields = Vector{Union{Symbol,Expr}}([:min_samples_leaf, :max_depth])
r = Base.range(tree_model, fields; lower=2, upper=10)

# Wrap the model
tuning = Grid(resolution=10, shuffle=true, rng=1234)
self_tuning_tree_model = TunedModel(model=tree_model,
                                    resampling = CV(nfolds=3),
                                    tuning = tuning,
                                    range = r,
                                    measure = rms);
self_tuning_tree = machine(self_tuning_tree_model, X, y);
fit!(self_tuning_tree, verbosity=0);

# Inspecting results
fitted_params(self_tuning_tree).best_model
report(self_tuning_tree).plotting.parameter_values

# Acces attributes of fitted models in the history
getproperty.(getindex.(report(self_tuning_tree).history, Ref(1)), Ref(:max_depth))
getproperty.(getindex.(report(self_tuning_tree).history, Ref(1)), Ref(:min_samples_leaf))

And this is it! I'd really like this functionality being added to MLJ, and I will beb able to spare time to make a 'beatiful' pull request if required! While I think most of the logic is done here, I think this still needs a lot of case testings, etc... to check if this works in other cases. I tried to work on them but don't really know how to tackle a mutli-package change/testing (since we need to modify both MLJBase and MLJTuning).

I hope we can work this out together to add this feature!

ablaom commented 3 years ago

@Leonardbcm This is a really interesting use case and its so cool to see the generic composition interface actually used to address it. Thanks to @OkonSamuel who understands this API so well.

Although this is cool, I can't imagine there being a big use-case for coupling a subset of parameters of different models (that are not custom-defined, as in your case). @Leonardbcm Could you say a little more about how your case arises?

Regarding @Leonardbcm's proposal outlined for generalizing range objects to become "diagonal". Wow - this is a pretty thorough proof-of-concept! The one thing that worries me is that each individual tuning strategy needs to be extended to handle the more general range objects. Have you thought about the Random case?

leonardtschora commented 3 years ago

Thanks for showing enthousiasm regarding this idea. I had no idea this concept was call "diagonal" ranges.

First of all, my use case of diagonal ranges occurs on custom models : I have 2 sources of data that needs to be pre-processed and combined before being fitted in a regressor. I want to find the set of hyper-parameter that yields the best regression metric, and some hyper-parameters have to be tied up to make the network work. For instance, there is a kind of a granularity parameter that has to be the same in both "preprocesser" otherwise the data is not consistent.

A more common use-case could be the following:

Let's say you want to applied a dimmensionality-reduction algorithm to your data, and then feed it into a neural network. The number of neurons of your first layer has to be tied with the number of output dimensions of your reduction transformer in order to properly work. Of course, these could be fixed, or you could dynamically set the number of units of the first layer during fitting but this means complete re-training of the network even if another parameter of the transformer changes. Again you could re-implement the update method but it has a lot of drawbacks, mostly logic duplication.

Or, let's say you want to build an heterogeneous ensemble of models: let's say an esemble of regressor that are sharing an hyper-parameter lambda.

Regarding the tuning strategies, I don't think this is much of a problem since conceptually, whether one or more parameter has to be set during a step does not really matters to the tunning strategy. What matters are the range values, how many are they, etc... and since Julia's multiple dispatch does the job for us after redefining recursive get and set property, users should not matter much about whether the "field" object they handle is a vector or an expr/symbol.

As seen, the only thing to change in the Grid methods where the key type of some storage dictionary, no logic has changed. As for RandomSearch, I have not tried it yet but from what I'm reading in the code, it should work with modifying MLJTuning.jl/src/range_methods.jl:

function process_random_range(user_specified_range::AbstractVector,
                              bounded,
                              positive_unbounded,
                              other)
    # r not paired:
    stand(r) = throw(ArgumentError("Unsupported range #2. "))
    stand(r::NumericRange) = stand(r, boundedness(r))
    stand(r::NumericRange, ::Type{<:Bounded}) = (r.field, sampler(r, bounded))
    stand(r::NumericRange, ::Type{<:Other}) = (r.field, sampler(r, other))
    stand(r::NumericRange, ::Type{<:PositiveUnbounded}) =
        (r.field, sampler(r, positive_unbounded))
    stand(r::NominalRange) = (n = length(r.values);
                              (r.field, sampler(r, fill(1/n, n))))
    # (r, d):
    stand(t::Tuple{ParamRange,Any}) = stand(t...)
    stand(r, d) = throw(ArgumentError("Unsupported range #3. "))
    stand(r::NominalRange, d::AbstractVector{Float64}) =  _stand(r, d)
    stand(r::NumericRange, d:: Union{DIST, Type{<:DIST}}) = _stand(r, d)
    _stand(r, d) = (r.field, sampler(r, d))

    # (field, s):
    stand(t::Tuple{Union{Symbol,Expr, Vector{Union{Symbol,Expr}}},Any}) = t

    return  Tuple(stand.(user_specified_range))

    # ret = zip(stand.(user_specified_range)...) |> collect
    # return first(ret), last(ret)
end

(I have to try this out today). Otherwise, the models! function make use of recursive_setproperty! which will automaticly fall back to the vector version if this is appriopriate. So no tuning logic has to be changed.

I don't know about how users could adapt their tuning strategies to incorporate this change, but again, tuning logic and which params have to be set are independant.

leonardtschora commented 3 years ago

So the following:

tuning = RandomSearch()
self_tuning_tree_model = TunedModel(model=tree_model,
                                    tuning=tuning,
                                    resampling=CV(nfolds=3),
                                    range=r,
                                    measure=rms,
                                    n=10);
self_tuning_tree = machine(self_tuning_tree_model, X, y);
fit!(self_tuning_tree, verbosity=0);

# Inspecting results
fitted_params(self_tuning_tree).best_model
report(self_tuning_tree).plotting.parameter_values

Is working fine, after having modified the process_random_range function as I did above.

From the small tests I did, I am in the MLJBase environement. I modified the files directly in a cloned version of the git code, and for the code belonging to MLJTuning, I did the ugly following:

import MLJTuning.fields_iterators_and_scales
function MLJTuning.fields_iterators_and_scales(ranges, resolutions)

    # following could have non-unique entries:
    fields = map(r -> r.field, ranges)

    iterator_given_field = Dict{Union{Symbol,Expr, Vector{Union{Symbol,Expr}}},Vector}()
    scale_given_field = Dict{Union{Symbol,Expr, Vector{Union{Symbol,Expr}}},Any}()
    for i in eachindex(ranges)
        fld = fields[i]
        r = ranges[i]
        if haskey(iterator_given_field, fld)
            iterator_given_field[fld] =
                vcat(iterator_given_field[fld], iterator(r, resolutions[i]))
            scale_given_field[fld] =
                _merge(scale_given_field[fld], scale(r))
        else
            iterator_given_field[fld] = iterator(r, resolutions[i])
            scale_given_field[fld] = scale(r)
        end
    end
    fields = unique(fields)
    iterators = map(fld->iterator_given_field[fld], fields)
    scales = map(fld->scale_given_field[fld], fields)

    return fields, iterators, scales
end

import MLJTuning.fields_iterators_and_scales
using MLJTuning: Bounded, Other, PositiveUnbounded, DIST, boundedness
function MLJTuning.process_random_range(user_specified_range::AbstractVector,
                              bounded,
                              positive_unbounded,
                              other)
    # r not paired:
    stand(r) = throw(ArgumentError("Unsupported range #2. "))
    stand(r::NumericRange) = stand(r, boundedness(r))
    stand(r::NumericRange, ::Type{<:Bounded}) = (r.field, sampler(r, bounded))
    stand(r::NumericRange, ::Type{<:Other}) = (r.field, sampler(r, other))
    stand(r::NumericRange, ::Type{<:PositiveUnbounded}) =
        (r.field, sampler(r, positive_unbounded))
    stand(r::NominalRange) = (n = length(r.values);
                              (r.field, sampler(r, fill(1/n, n))))
    # (r, d):
    stand(t::Tuple{ParamRange,Any}) = stand(t...)
    stand(r, d) = throw(ArgumentError("Unsupported range #3. "))
    stand(r::NominalRange, d::AbstractVector{Float64}) =  _stand(r, d)
    stand(r::NumericRange, d:: Union{DIST, Type{<:DIST}}) = _stand(r, d)
    _stand(r, d) = (r.field, sampler(r, d))

    # (field, s):
    stand(t::Tuple{Union{Symbol,Expr, Vector{Union{Symbol,Expr}}},Any}) = t

    return  Tuple(stand.(user_specified_range))

    # ret = zip(stand.(user_specified_range)...) |> collect
    # return first(ret), last(ret)
end

Copy and paste direclty in the REPL. This is just a quick workaround to avoid package problems while updating 2 packages at once. Hope this helps.

ablaom commented 3 years ago

@Leonardbcm @DilumAluthge

Okay, I have more carefully studied the previous discussions. An earlier solution of yours ought to have worked but for a bug in the implementation of update.

I would like to sort this before proceeding further the "diagonal" proposal, although I very much appreciate the detailed design given above which looks good. I hope you won't mind.

leonardtschora commented 3 years ago

Hi,

Sorry I have been on a break for 2 weeks. I am very happy to see that my work has proved to be useful for the MLJ workflow. As now I need to move forward in my project, I will use my forked version of MLJ including the "diagonal" hyper-parameter coupling. I will let you know if I encounter a bug or something.

Thanks anyway for the support and reactivity!

ablaom commented 3 years ago

@Leonardbcm Thanks for the update.

We have in fact resolved the issue with composite models not retraining when they should, so that your naive attempts to address your use-case should now work with MLJBase 0.15.0. Can you confirm this change indeed allows you to address your use case along the original lines (without the "diagonal" HP coupling)?

You can see an example of HP coupling in a composite model here.

leonardtschora commented 3 years ago

I have seen the changes you have brougth to MLJBase and I have tried them on your example after updating the packages. I have run your example but I don't understand why changing the ridge_solver should # should retrain everything.

Moreover, the running of the code causes not a retraining but a creation of new machines:

X, y = make_regression(100, 16);
composite = MyComposite(KMeans(), MLJLinearModels.Analytical(), 10)
mach = machine(composite, X, y) |> fit!

mach = machine(composite, X, y) |> fit! [ Info: Training Machine{MyComposite} @363. [ Info: Training Machine{KMeans} @195. [ Info: Training Machine{RidgeRegressor} @016. Machine{MyComposite} @363 trained 1 time. args: 1: Source @081 ⏎ Table{AbstractArray{Continuous,1}} 2: Source @233 ⏎ AbstractArray{Continuous,1}

composite.ridge_solver = MLJLinearModels.Analytical(iterative=true)
fit!(mach)

[ Info: Updating Machine{MyComposite} @363. [ Info: Training Machine{KMeans} @918. [ Info: Training Machine{RidgeRegressor} @116. Machine{MyComposite} @363 trained 2 times. args: 1: Source @081 ⏎ Table{AbstractArray{Continuous,1}} 2: Source @233 ⏎ AbstractArray{Continuous,1}

It seems to me that we want the Kmeans model to not retrain (since it happens before the regression).

I am going to try other examples of my on to see the effect of the new update! function.

leonardtschora commented 3 years ago

Here are some simple examples to check if things are working properly:

##########
using MLJBase: @mlj_model
@mlj_model mutable struct SubModel <: Unsupervised
    hp1::Int = 2
end

function MLJ.fit(model::SubModel, verbosity::Integer, X)
    fitresult = model.hp1
    cache = nothing
    report = nothing
    return fitresult, cache, report
end

function MLJ.transform(model::SubModel, fitresult, X)
    return fitresult * X
end

@mlj_model mutable struct Composite1 <: UnsupervisedComposite
    hp1::Int # A coupling hp
    hp2::Int
end

function MLJ.fit(model::Composite1, verbosity::Integer, X)   
    Xs = source(X)

    # Create a model A with the hyper-parameter hp1
    sub = SubModel(; hp1=model.hp1)
    mach = machine(sub, Xs)
    out = transform(mach, Xs)

    ret = machine(Unsupervised(), Xs; transform=out)
    return!(ret, model, verbosity)
end

X = rand(100, 3)
mod = Composite1()
mach = machine(mod, X)

julia> fit!(mach) [ Info: Training Machine{Composite1} @563. [ Info: Training Machine{SubModel} @359. Machine{Composite1} @563 trained 1 time. args: 1: Source @719 ⏎ AbstractArray{Continuous,2}

Machine number 359 is created for wrapping the SubModel.

# Modifying hp2 should do nothing since hp2 is not used.
# In the worst case, the sub model sub should at least be untouched.
mod.hp2 = 1

julia> fit!(mach) [ Info: Updating Machine{Composite1} @563. [ Info: Training Machine{SubModel} @804. Machine{Composite1} @563 trained 2 times. args: 1: Source @719 ⏎ AbstractArray{Continuous,2}

A new machine (number 804) wrapping submodel has been created and this is not an expected behavior.

# We expect to have an update of the machine wrapping submodel
mod.hp1 = 1

julia> fit!(mach) [ Info: Updating Machine{Composite1} @563. [ Info: Training Machine{SubModel} @564. Machine{Composite1} @563 trained 3 times. args: 1: Source @719 ⏎ AbstractArray{Continuous,2}

Again, a new machine (number 564) is created.

This example is with the submodel as an attribute of the composite:

mutable struct Composite2 <: UnsupervisedComposite
    sub::SubModel # the submodel as an attribute
    hp2::Int
end

function MLJ.fit(model::Composite2, verbosity::Integer, X)   
    Xs = source(X)

    mach = machine(model.sub, Xs)
    out = transform(mach, Xs)

    ret = machine(Unsupervised(), Xs; transform=out)
    return!(ret, model, verbosity)
end

mod = Composite2(SubModel(), 2)
mach = machine(mod, X)

julia> fit!(mach) [ Info: Training Machine{Composite2} @426. [ Info: Training Machine{SubModel} @764. Machine{Composite2} @426 trained 1 time. args: 1: Source @324 ⏎ AbstractArray{Continuous,2}

Machine number 764 wraps the submodel.

# Here, we expect to have an update of the sub model machine
mod.sub.hp1 = 1

julia> fit!(mach) [ Info: Updating Machine{Composite2} @426. [ Info: Updating Machine{SubModel} @764. Machine{Composite2} @426 trained 2 times. args: 1: Source @324 ⏎ AbstractArray{Continuous,2}

This part is working as expected.

# We expect to have no refiting, or at least we want the submodel not to be retrained becasue it is independent from hp2. 
mod.hp2 = 10

julia> fit!(mach) [ Info: Updating Machine{Composite2} @426. [ Info: Training Machine{SubModel} @388. Machine{Composite2} @426 trained 3 times. args: 1: Source @324 ⏎ AbstractArray{Continuous,2}

A new machine for the submodel is created, EDIT : I switched back to the preivous MLJ version (0.12.1) and found that the second models where behaving as I wanted them to (Update of the SubModel while changing hp1 and not retraining hp2 while changing hp2). However, case 1 is still not working and nothing is updated while we change hp1.

OkonSamuel commented 3 years ago

@Leonardbcm Let me clarify some things. @ablaom patch actually fixes a bug which you didn't notice. Let me explain. If you have a composite model

@mlj_model mutable struct Composite1 <: UnsupervisedComposite
    hp1::Int # A coupling hp
    hp2::Int
end

function MLJ.fit(model::Composite1, verbosity::Integer, X)   
    Xs = source(X)

    # Create a model A with the hyper-parameter hp1
    sub = SubModel(; hp1=model.hp1)
    mach = machine(sub, Xs)
    out = transform(mach, Xs)

    ret = machine(Unsupervised(), Xs; transform=out)
    return!(ret, model, verbosity)
end
X = rand(100, 3)
mod = Composite1()
mach = machine(mod, X)

calling fit!(mach) the second time calls update method which previously(MLJ 0.12.1) gave a wrong result (despite updating and not creating new machines) because the hp in sub = SubModel(; hp1=model.hp1) isn't updated. So this new behavior(creating new machines in the learning network) is required as MLJ fallback update method for Composites doesn't know if hp1 is actually used in the Composite model implementation talk-less of knowing that SubModel in the implementation depends on hp1. The fallback update method would have worked well if hp1 was a MLJ.Model instance.

Users are allowed to define a custom update method for their Composite model to address all points you noted in your last comment.

ablaom commented 3 years ago

@Leonardbcm Thanks for those extensive investigations.

Yes, I should have said that "smart" training does not apply to hyperparameters of a composite that are not associated with models in the learning network (in particular, any non-model hyperparameter). So, for example, if a composite hyperparameter is just a float, then changing that will always trigger a retraining of the whole learning network (because, in fact, the network will have to be rebuilt to splice in the change). I don't believe there is any way in the existing design to improve this. It might be possible to customise update to do this, but you may need more than casual knowledge of how things work under the hood.

With this new understanding, do you argue there is still a use-case for the "diagonal" business?

P.S. As a side note, I should warn you that no hyperparameter is allowed to be mutated within fit. This applies to all models, not just composites. https://github.com/alan-turing-institute/MLJ.jl/issues/654

leonardtschora commented 3 years ago

HI, thanks for your answers, I see now what version 0.13.1 fixes.

So this new behavior(creating new machines in the learning network) is required as MLJ fallback update method for Composites doesn't know if hp1 is actually used in the Composite model implementation talk-less of knowing that SubModel in the implementation depends on hp1.

Indeed, now, models that have machines created inside their fit methods using is hyper-parameters are taken in account while updating the top model, and thus they are re-wrapped and trained while one of their hyper-parameters changes. This was not the case in v0,12.1.

Yes, I should have said that "smart" training does not apply to hyperparameters of a composite that are not associated with models in the learning network (in particular, any non-model hyperparameter).

And this is where was my confusion : I was expecting the change of any hyper-parameter (whereas it is used in the model itself or for creating another model inside) to cause smart retraining (updating machines rather than re-building them).

P.S. As a side note, I should warn you that no hyperparameter is allowed to be mutated within fit. This applies to all models, not just composites.

This last one explains everything : re-training nested machines can't work if one is not allowed to mutate hyper-parameters inside the fit method. Thus, we are in obligation to create a new sub-model inside fit and thus re-wrapping it in a new machine (perhaps it's possible to re-use the same machine?).

Anyway, where I found this behavior weird is for my second example, where models are attributes of the main model. You can see in the last fit! that changing hp2 caused an unwanted re-wrapping of the submodel whereas changing "hp1" did an update. Maybe this case is avoidable?

With this new understanding, do you argue there is still a use-case for the "diagonal" business?

I agree that now it is possible to bind hyper-parameters because it is possible, in the fit method, to use hyper-params of the top model to instantiate sub-models. I still have two remarks on the matter: this will only work for user-defined composite models. 1) My forest example would not be feasible.

For some reason, we want hour tree to have min_samples_leaf == max_depth fields = Vector{Union{Symbol,Expr}}([:min_samples_leaf, :max_depth])

2) Smart refitting would not work withtout redifining update and this is a real bottleneck for my use case. Currently, I am using a local copy of MLJBase and MLJTuning along with MLJ v0.12.1 and the "diagonal" hyper-parameters design I proposed, and I am satisfied with the way it works, although I must precise that it is conceptualy weird to bind hyper-parameters this way (this is supposed to be a model logic rather than a grid search constraint).

ablaom commented 3 years ago

Anyway, where I found this behavior weird is for my second example, where models are attributes of the main model. You can see in the last fit! that changing hp2 caused an unwanted re-wrapping of the submodel whereas changing "hp1" did an update. Maybe this case is avoidable?

I agree it is weird (but safe). I don't immediately see a straightforward way to (generically) avoid this in the current design.

Regarding your proposal, which I have given some further thought: I think it could be worth the extra complication if it is generalized slightly. Let's take the random forest example. Instead of the constraint that :min_samples_leaf and :max_depth be identical, a more likely scenario is that I am generating pairs of values depending on two functions f_1 and f_2 that map values p in the "range" 2 to 5 (say) to pairs of hyperparameter values model.min_samples_leaf = f_1(p) and model.max_depth = f_2(p). We already allow the scale keyword to specify a function (instead of a symbolic scale, like :log) like this:

r = range(model, :min_samples_split, lower=2, upper=10, scale= x->x^2)
julia> iterator(r, 5)
5-element Array{Int64,1}:
   4
  16
  36
  64
 100

Building on your syntax (but note the explicit specification of the type Int in place of model) the generalization I am suggesting would look something like this:

r = range(Int, [:min_samples_split, :max_depth], lower =2, upper=10, scale = x-> (x, x^2))
julia> iterator(r, 5)
5-element Array{Tuple{Int64,Int64},1}:
 (2, 4)
 (4, 16)
 (6, 36)
 (8, 64)
 (10, 100)

Note that it would be the responsibility of the user to ensure that the values have the correct type (we allow coupling of hyperparameters of different type). The Int specification does not refer to the value of the functions only the domain, which the user ought to specify. This is not actually what happens in the current implementation of scale (it will round things if T <: Integer) but I think it's conceptually more correct.

If scale is not specified, we could have a "diagonal" default (in the case fields is a Vector{Symbol}) of x -> (x, x, .... x). In the implementation, I don't think you would need to overload recursive_setparameter! as you propose, only be sure to broadcast it over the (sometimes vector) arguments field, value. So calls in Grid, for example, would look something like this:

broadcast((f, m) -> recursive_setparameter!(model, f, m), field, value)

What do you think?

[P.S. There is one little issue to be addressed here and in your existing proposal, which is in reporting results in tuning : https://github.com/alan-turing-institute/MLJTuning.jl/blob/3b1b35db56da5c764dd5248259d14bc8bd3db790/src/utilities.jl#L17 . In your POC, you will error if you try to plot(tuned_mach) after a two-parameter tune, for example. I think we just mash the field name strings into one?? Or something like this. ]

leonardtschora commented 3 years ago

I think the idea of generating values such as x -> (f1(x), f2(x), ...) is an excellent generallization. This could also give the user more flexibility and more control on the parameter grid.

I don't think you would need to overload recursive_setparameter! as you propose, only be sure to broadcast it over the (sometimes vector) arguments field, value.

My overloading of recursive_setproperty! consisted of broadcasting over the arrays anyway.

broadcast((f, m) -> recursive_setparameter!(model, f, m), field, value)

This can be written recursive_setparameter!(Ref(model), field, value)

The Int specification does not refer to the value of the functions only the domain, which the user ought to specify.

If my understanding is correct, you need the user tu specify the type of the x later given to the functions in scale, which can return values of any types?

P.S. There is one little issue to be addressed here and in your existing proposal, which is in reporting results in tuning

I am using the plot function to visualize my grid search results and don't have any problems with it. I think the line parameter_names=string.(fields) |> collect does the job even if some elements of fields are arrays.

fields = [:foo, [:bar, :baz]]
string.(fields)

2-element Array{String,1}: "foo" "[:bar, :baz]"

It is a bit ugly but won't trigger a bug. We could do with:

string.(fields .|> x -> x isa Union{Symbol, Expr} ? x : .*(x...))

2-element Array{String,1}: "foo" "barbaz"

or insterting delimiters (spaces?) between each coupled field, etc...

ablaom commented 3 years ago

This can be written recursive_setparameter!(Ref(model), field, value)

Nice trick!

How about:

2-element Array{String,1}: "foo" "coupled(bar, baz)"

If my understanding is correct, you need the user tu specify the type of the x later given to the functions in scale, which can return values of any types?

Yes. In the case the specified scale is not a symbol, or field is a vector (and scale is reset), we issue an error if !(model isa Type):

throw(ArgumentError( "Range object will generate values according to `x -> scale(x)` where `x` is assumed to have the type of the first argument provided the `range` constructor, which was not a type. "))

And, for consistency, we change the behaviour of iterator and sampler in the case that field is a single symbol, so that raw values of scale(x) are returned regardless of the type of the T parameter of the range object. Currently, if T <: Integer we round the output. This is breaking.

What do you think?

cc @CameronBieganek

leonardtschora commented 3 years ago

This can be written recursive_setparameter!(Ref(model), field, value)

Of course we need to write the broadcast operator: 'recursive_setparameter!.(Ref(model), field, value)' The Ref key word will only "trick" de broadcasting operator and force Julia to use the same model for each iteration while it broadcasts on 'field' and 'value'.

2-element Array{String,1}: "foo" "coupled(bar, baz)"

Has to be written differently, I forgot that in my Julia environement, I have specialized the * operator so it can concatenate symbols.

function format_arr(arr)
           s = "coupled("
           for el in arr
               s *= string(el)
               s *= ", "
           end
      return s[begin:end-2] * ")"
end

fields = [:foo, [:bar, :baz], [:(biz.boz), :(boz.biz)]]
fields .|> x -> x isa Union{Symbol, Expr} ? x : format_arr(x)

3-element Array{Any,1}: :foo "coupled(bar, baz)" "coupled(biz.boz, boz.biz)"

As for the rest I agree conceptually and I'm currently diving into the code in order to see what needs to be changed.