Closed dpaetzel closed 1 month ago
It seems likely TunedModel
is a stretch for your purpose. I'm a little confused because you say you are already doing optimization internally - so what then is the role of MLJTuning.jl?
There is the possibility of using a custom selection heuristic. By default, Tuned Model
just picks out the model (hyper-parameter set) with the lowest loss / highest score, but a custom selection heuristic allows for something different (e.g., "parsimonious" selection) so long as you can make a decision based on just the model evaluation history. Currently, what it written to the history is not controlled by the user (some of it depends on the tuning strategy, such as Grid
, RandomSearch
, etc) but perhaps we could add a TunedModel
hyperparameter f
that gets applied to each (model, fitresult)
(eg. returning model complexity) and we could arrange to have that written to the history in all cases. Just brainstorming here...
Thanks for responding!
It seems likely TunedModel is a stretch for your purpose. I'm a little confused because you say you are already doing optimization internally - so what then is the role of MLJTuning.jl?
Sorry, I may have not been clear enough there. I meant internal optimization in the sense of candidate model fitting (i.e. model parameter optimization; e.g. fitting an NN's parameters to the data using backprop, just in my case an Evolutionary Algorithm is used instead of gradient descent).
There is the possibility of using a custom selection heuristic.
Thank you, I had somehow missed that.
As I understand it, the selection heuristic best(…)
is applied only at the last step. This means that for strategies like Grid
, RandomSearch
, LatinHypercube
(where the next search point is independent of the search history so far), if I was able to write to the history additional things (like you proposed) this could work; I'd simply set measure
to an arbitrary thing (e.g. mae
).
However, for more sophisticated strategies like TreeParzen
this is probably not enough because, as I understand it, each next search point is selected based on the history of hyperparametrizations and measure
values.
I myself will probably for now stick with the simpler Grid
/LatinHypercube
-based approaches anyways so the latter wouldn't be a nuissance for me at the moment.
I'll look into this a bit more and then comment on what I find out.
I was able to adjust tuned_models.jl
such that the following code works (myextra
only returns the tree field for simplicity's sake)
using MLJ
DTRegressor = @load DecisionTreeRegressor pkg = DecisionTree verbosity = 0
N = 300
X, y = rand(N, 3), rand(N)
X = MLJ.table(X)
model = DTRegressor()
space = [
range(model, :max_depth; lower=1, upper=5),
range(
model,
:min_samples_split;
lower=ceil(0.001 * N),
upper=ceil(0.05 * N),
),
]
function myextra(model, fparams)
# fparams = fitted_params(fitted_params(resampling_machine).machine)
return fparams.tree
end
modelt = TunedModel(;
model=model,
resampling=CV(; nfolds=3),
tuning=LatinHypercube(; gens=30),
range=space,
measure=mae,
n=2,
userextras=myextra,
)
macht = machine(modelt, X, y)
MLJ.fit!(macht; verbosity=1000)
display(report(macht).history[1].userextras)
However, I now have the problem that this only yields a single evaluation of myextra
despite using 3-fold CV. What I would want is evaluating myextra
once for each fold but the CV machinery (i.e. evaluate
) is in MLJBase
.
I guess I'll have to pass userextras
to evaluate
as well and then alter the PerformanceEvaluation
struct (instances of which evaluate
returns) … Or is there a way to have additional functions be evaluated by evaluate
that I'm not seeing?
My bad, I just noticed that there is PerformanceEvaluaten.fitted_params_per_fold
that I can use.
Would you be interested in a PR that introduces the userextras
option as showcased above? (The name is of course up for debate but since MLJTuning.extras(…)
fulfills a similar role at the tuning strategy level I chose that for now.)
I'd argue that I'm probably not the only one who wants to log and later access additional metrics during optimization. And, in addition to that, for Grid
and similars, this allows to write a custom best
that selects the best using a measure with a different form than f(y, yhat)
(e.g. complexity etc.).
Or is this too niche in your opinion? I'm also fine with keeping this in my personal fork for now if you don't find it useful enough. :slightly_smiling_face:
Thanks a lot for your help!
I'd be happy to review such a PR.
Don't have a great idea for the name. What about history_additions
?
I know I said history_additions(model, fitresult)
for the signature but fitresult
is not really public API. Rather, can we do history_additions(model, fp)
where fp
is something of the form fitted_params(model, fitresult)
(equivalently, fitted_params(mach)
, where mach = machine(model, data...) |> fit!
)?
I think this issue is resolved, albeit through a different means than originally posed.
Closing for now.
Hi, thank you for developing and maintaining this part of MLJ! :slightly_smiling_face:
I was wondering how one would go about the following:
I created a custom MLJ model type (let's call it
CustomModel
) which uses internally a custom optimizer (actually, an Evolutionary Algorithm but this is not important) with a custom objective functioncost
[^1]. I'd now like to perform hyperparameter optimization with respect tocost
. At that,cost
is computed by the inner optimizer anyways (since it's optimizing for it) and I can make thefitresult
s ofCustomModel
contain the highest achievedcost
value by the inner optimizer. What I'd like to do is provide instead of e.g.measure=mae
something likemeasure=custom
wherecustom
acceptsfitresult
s (or something along those lines).I looked mostly into
tuned_models.jl
andresampling.jl
and (presumably since I'm not familiar enough with the code) I only saw ways to achieve this that look like a lot of work. Maybe there is another way?So far, it looks to me like writing my own version of
TunedModel
would be less work than changing the existing code. Maybe you can give me a hint as to what I'm missing? Did no one run into this so far?Thank you for your time!
[^1]: Let's assume that
cost
is actually a decent measure for the inner optimizer. For example,cost
does not only include predictive performance but also model complexity and it's therefore not of the formcost(y, yhat)
but more likecost(y, yhat, complexity, otherstuff)
.