FalkonML / falkon

Large-scale, multi-GPU capable, kernel solver
https://falkonml.github.io/falkon/
MIT License
181 stars 22 forks source link

Custom hyperparameter optimization #50

Closed arthurfl closed 2 years ago

arthurfl commented 2 years ago

Hi,

I'd like to implement a hyperparameter optimization procedure based on minimizing a loss function computed on a validation set, in order to preserve transferability as much as possible.

I was previously using the built-in hopt classes in the following way:

model = SGPR(
    kernel=kernel, penalty_init=penalty_init, centers_init=centers_init,
    opt_penalty=True, opt_centers=False)

opt_hp = torch.optim.Adam(model.parameters(), lr=lr)

    for epoch in range(100):
    opt_hp.zero_grad()
    loss = model(X_train, Y_train)
    loss.backward()
    opt_hp.step()

What I'm trying to implement now should probably look like that:

model = SGPR(
    kernel=kernel, penalty_init=penalty_init, centers_init=centers_init,
    opt_penalty=True, opt_centers=False)

opt_hp = torch.optim.Adam(model.parameters(), lr=lr)
loss_fn = torch.nn.L1Loss()

for epoch in range(100):
    opt_hp.zero_grad()
    model(X_train, Y_train)
    loss = loss_fn(model.predict(X_val), Y_val)
    loss.requires_grad = True
    loss.backward()
    opt_hp.step()

But the loss doesn't change upon optimization - the hyperparameters are probably not updated at all. Would that be related to the computation of dLoss/dx ? Should I use an instance of falkon.Falkon instead of one of the falkon.hopt.objectives to define the model (if I remember well I had issues related to keops or cuda with falkon.Falkon)?

many thanks, Arthur

Giodiro commented 2 years ago

Hi Arthur, unlike what you commonly see in pytorch models, the forward pass of the hopt objectives already calculates the loss (so the call model(X_train, Y_train) returns the loss on which you need to call backward. You should probably implement a new model (say L1SGPR) whose forward pass computes a penalized l1 loss. In particular, if you look at the original SGPR model you may just need to change the data-fit terms of the loss, but you will need to have a look at the SGPR equations derivation to better understand what's going on!

arthurfl commented 2 years ago

Thanks for the quick reply! I could implement something, starting from one of the exact objective functions. Incidentally I used sgpr.py, but the loss function is nothing like the SGPR model (simply an L1 loss computed on a validation set, distinct from the centers used in KRR).

best, Arthur

Giodiro commented 2 years ago

Then maybe you could start from https://github.com/FalkonML/falkon/blob/master/falkon/hopt/objectives/exact_objectives/holdout.py which does the train/validation splitting (but uses the MSE).

arthurfl commented 2 years ago

Thanks for the suggestion - actually I'd rather separate the regression/optimization step from the data splitting step, as I need to have datasets with uniformly distributed Y's. Anyway, starting from sgpr.py did the trick!