Open froody opened 6 months ago
Hello @froody,
Good catch. The behavior of adaptive_grad_clip
hides indeed some logic that could mislead users indeed.
If you are willing to do a pr to let this function handle ndim=5
that would be great. I don't know exactly how you can add an axis
and keep the current default behavior, I let you try and see :)
Thank you !
unitwise_norm
, used byadaptive_grad_clip
, only supports a few values ofndim
, and raises ValueError when applied to a conv3d kernel sincendim=5
(HWDIO). Would it be acceptable to add an optionalaxis
kwarg toadaptive_grad_clip
andunitwise_norm
? This would allow specifying the reduction axes at the callsite instead of baking every possible combination into the implementation ofunitwise_norm
.I'm happy to submit a PR