JuliaAI / MLJTuning.jl

Hyperparameter optimization algorithms for use in the MLJ machine learning framework
MIT License
66 stars 12 forks source link

Use measures that are not of the form `f(y, yhat)` but `f(fitresult)` #202

Closed dpaetzel closed 1 month ago

dpaetzel commented 6 months ago

Hi, thank you for developing and maintaining this part of MLJ! :slightly_smiling_face:

I was wondering how one would go about the following:

I created a custom MLJ model type (let's call it CustomModel) which uses internally a custom optimizer (actually, an Evolutionary Algorithm but this is not important) with a custom objective function cost[^1]. I'd now like to perform hyperparameter optimization with respect to cost. At that, cost is computed by the inner optimizer anyways (since it's optimizing for it) and I can make the fitresults of CustomModel contain the highest achieved cost value by the inner optimizer. What I'd like to do is provide instead of e.g. measure=mae something like measure=custom where custom accepts fitresults (or something along those lines).

I looked mostly into tuned_models.jl and resampling.jl and (presumably since I'm not familiar enough with the code) I only saw ways to achieve this that look like a lot of work. Maybe there is another way?

So far, it looks to me like writing my own version of TunedModel would be less work than changing the existing code. Maybe you can give me a hint as to what I'm missing? Did no one run into this so far?

Thank you for your time!

[^1]: Let's assume that cost is actually a decent measure for the inner optimizer. For example, cost does not only include predictive performance but also model complexity and it's therefore not of the form cost(y, yhat) but more like cost(y, yhat, complexity, otherstuff).

ablaom commented 6 months ago

It seems likely TunedModel is a stretch for your purpose. I'm a little confused because you say you are already doing optimization internally - so what then is the role of MLJTuning.jl?

There is the possibility of using a custom selection heuristic. By default, Tuned Model just picks out the model (hyper-parameter set) with the lowest loss / highest score, but a custom selection heuristic allows for something different (e.g., "parsimonious" selection) so long as you can make a decision based on just the model evaluation history. Currently, what it written to the history is not controlled by the user (some of it depends on the tuning strategy, such as Grid, RandomSearch, etc) but perhaps we could add a TunedModel hyperparameter f that gets applied to each (model, fitresult) (eg. returning model complexity) and we could arrange to have that written to the history in all cases. Just brainstorming here...

dpaetzel commented 6 months ago

Thanks for responding!

It seems likely TunedModel is a stretch for your purpose. I'm a little confused because you say you are already doing optimization internally - so what then is the role of MLJTuning.jl?

Sorry, I may have not been clear enough there. I meant internal optimization in the sense of candidate model fitting (i.e. model parameter optimization; e.g. fitting an NN's parameters to the data using backprop, just in my case an Evolutionary Algorithm is used instead of gradient descent).

There is the possibility of using a custom selection heuristic.

Thank you, I had somehow missed that.

As I understand it, the selection heuristic best(…) is applied only at the last step. This means that for strategies like Grid, RandomSearch, LatinHypercube (where the next search point is independent of the search history so far), if I was able to write to the history additional things (like you proposed) this could work; I'd simply set measure to an arbitrary thing (e.g. mae).

However, for more sophisticated strategies like TreeParzen this is probably not enough because, as I understand it, each next search point is selected based on the history of hyperparametrizations and measure values.

I myself will probably for now stick with the simpler Grid/LatinHypercube-based approaches anyways so the latter wouldn't be a nuissance for me at the moment.

I'll look into this a bit more and then comment on what I find out.

dpaetzel commented 6 months ago

I was able to adjust tuned_models.jl such that the following code works (myextra only returns the tree field for simplicity's sake)

using MLJ
DTRegressor = @load DecisionTreeRegressor pkg = DecisionTree verbosity = 0

N = 300
X, y = rand(N, 3), rand(N)
X = MLJ.table(X)

model = DTRegressor()

space = [
    range(model, :max_depth; lower=1, upper=5),
    range(
        model,
        :min_samples_split;
        lower=ceil(0.001 * N),
        upper=ceil(0.05 * N),
    ),
]

function myextra(model, fparams)
    # fparams = fitted_params(fitted_params(resampling_machine).machine)
    return fparams.tree
end

modelt = TunedModel(;
    model=model,
    resampling=CV(; nfolds=3),
    tuning=LatinHypercube(; gens=30),
    range=space,
    measure=mae,
    n=2,
    userextras=myextra,
)

macht = machine(modelt, X, y)
MLJ.fit!(macht; verbosity=1000)
display(report(macht).history[1].userextras)

However, I now have the problem that this only yields a single evaluation of myextra despite using 3-fold CV. What I would want is evaluating myextra once for each fold but the CV machinery (i.e. evaluate) is in MLJBase.

I guess I'll have to pass userextras to evaluate as well and then alter the PerformanceEvaluation struct (instances of which evaluate returns) … Or is there a way to have additional functions be evaluated by evaluate that I'm not seeing?

dpaetzel commented 6 months ago

My bad, I just noticed that there is PerformanceEvaluaten.fitted_params_per_fold that I can use.

dpaetzel commented 6 months ago

Would you be interested in a PR that introduces the userextras option as showcased above? (The name is of course up for debate but since MLJTuning.extras(…) fulfills a similar role at the tuning strategy level I chose that for now.)

I'd argue that I'm probably not the only one who wants to log and later access additional metrics during optimization. And, in addition to that, for Grid and similars, this allows to write a custom best that selects the best using a measure with a different form than f(y, yhat) (e.g. complexity etc.).

Or is this too niche in your opinion? I'm also fine with keeping this in my personal fork for now if you don't find it useful enough. :slightly_smiling_face:

Thanks a lot for your help!

ablaom commented 6 months ago

I'd be happy to review such a PR.

Don't have a great idea for the name. What about history_additions?

I know I said history_additions(model, fitresult) for the signature but fitresult is not really public API. Rather, can we do history_additions(model, fp) where fp is something of the form fitted_params(model, fitresult) (equivalently, fitted_params(mach), where mach = machine(model, data...) |> fit!)?

ablaom commented 1 month ago

I think this issue is resolved, albeit through a different means than originally posed.

Closing for now.