differentiable in adam optimizer // optimizer arguments

PTB-MR / mrpro

MR image reconstruction and processing.

https://ptb-mr.github.io/mrpro/

Apache License 2.0

10 stars 3 forks source link

differentiable in adam optimizer // optimizer arguments #203

Closed fzimmermann89 closed 4 days ago

fzimmermann89 commented 4 months ago

We have a differentiableflag in the adam optimizer. This will not work as expected as we always detach the parameters.

Also, we should reconsider which of the optimizer flags we actually support. I removed the capturable, in #193 as this is not useful for us.

fzimmermann89 commented 4 months ago

Also, foreach does not make sense as a setting in our use case.

fzimmermann89 commented 1 month ago

If we now add more optimizers, we should reconsider what arguments our optimizers should have and check that they make sense

@ckolbPTB @koflera

ckolbPTB commented 2 weeks ago

Currently we have these parameters:


amsgrad
        whether to use the AMSGrad variant of this algorithm from the paper
        `On the Convergence of Adam and Beyond`, by default False
foreach
        whether `foreach` implementation of optimizer is used, by default None
maximize
        maximize the objective with respect to the params, instead of minimizing, by default False
differentiable
        whether autograd should occur through the optimizer step. This is currently not implemented.
fused
        whether the fused implementation (CUDA only) is used. Currently, torch.float64, torch.float32,
        torch.float16, and torch.bfloat16 are supported., by default None

Suggestion: remove all and only add them back if we really need them

koflera commented 1 week ago

I think I agree with the suggestion. The only perhaps useful parameter would be maximized, but one can easily implement this by setting the functional f to be maximized to -f and minimize that instead.

fzimmermann89 commented 1 week ago

I would say the only useful parameter is amsgrad.

everything else does either not make sense, does not matter, or can be achieved by setting f to -f.