Select foreach = False in the optimizer but this downgrades the computational performance.
Upcast manually all tensors to float64 by hand instead of doing torch.set_default_dtype(torch.float64).
As PyTorch seems to fix that in the next release, I recommend just making the first point. If it is not fixed in 2.2, then we should try the last point.
PyTorch 2.1 has a known bug preventing training in float64: https://discuss.pytorch.org/t/tensors-of-the-same-index-must-be-on-the-same-device-and-the-same-dtype-except-step-tensors-that-can-be-cpu-and-float32-notwithstanding/190335
There are three workarounds:
foreach = False
in the optimizer but this downgrades the computational performance.torch.set_default_dtype(torch.float64)
.As PyTorch seems to fix that in the next release, I recommend just making the first point. If it is not fixed in 2.2, then we should try the last point.