Open hal-314 opened 2 years ago
gradient_clip_algorithm="my_custom_clipping_algorithm"
gradient clipping arguments within Trainer are meant to be used only when you want to use them with built-in methods already implemented within Lightning. In case you want an identifier, you can use some sort of hparam or store it within your module state.
class LightningModule:
def __init__(self, clip_algo, ...):
self.clip_algo = clip_algo
def configure_gradient_clipping(...):
if clip_algo == 'my_custom_clipping_algorithm':
my_custom_clipping(optimizer, gradient_clip_val)
else:
self.clip_gradients(...)
@rohitgr7 I agree that we could improve on this and then just error out within self.clip_gradients
@justusschock well some of the plugins don't support custom gradient clipping so checking this value as early as possible during init seems more reliable to me.. also those flags are just meant if someone wants to use the built-in algorithms.
@rohitgr7 We can check during init and then error out if it is not supported by the plugin (like we currently do) and for plugins that don't care we could do lazy checking.
So far I know that they are/were meant to switch between built-in algorithms, but this doesn't seem intuitive to me.
I find quite confusing that this flags are meant to only be used by built-in algorithms. In case someone implements a custom gradient clipping algorithm, the user would have two options:
gradient_clip_val=None
(default value) as docs says that gradient clipping is disabled. In this case, will configure_gradient_clipping
be called? If it isn't, we would force users to pass some arguments through trainer flags and some others through LightningModule. In the original example, users should set the gradient value through trainer flag and algorithm through LightningModule.Another option is to forbid using trainer flags gradient_clip_algorithm
and gradient_clip_val
when the module implements configure_gradient_clipping
. This would allow easy discoverability (as said in #10528), reduce BC, avoid undefined behavior and it would be only one way to configure gradient clipping. Note that if going with this option, it may be necessary to add should_gradients-be_clipped
to avoid Lightning unscale gradients when it isn't necessary (simulate current gradient_clip_val=None
behavior).
🐛 Bug
I can't pass a custom gradient clipping algorithm, although I implemented
configure_gradient_clipping
hook, Documentation and release notes hints that you can useconfigure_gradient_clipping
to implement your custom gradient clipping algorithm (.release notes: ... This means you can now implement state-of-the-art clipping algorithms with Lightning! ...
) .Please, allow to pass custom algorithm names in
gradient_clip_algorithm
when the model implementsconfigure_gradient_clipping
.To Reproduce
Expected behavior
Environment
Additional context
I want to train NFNets with Adaptive Gradient Clipping and compare with standard L2 gradient clipping.