Update the torch dependency to >= 2.0. Since the gradient clipping no longer supports SparseCUDA backend, I include an old implementation of torch.nn.clip_grad_norm_ from torch==1.13.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Issue #, if available: N/A
Description of changes:
Update the torch dependency to >= 2.0. Since the gradient clipping no longer supports SparseCUDA backend, I include an old implementation of
torch.nn.clip_grad_norm_
fromtorch==1.13
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.