Open NaNillll opened 2 years ago
Thanks for your analysis. We also need to study the reasons carefully.
@NaNillll @RainbowRui @Juyong yeah the same problem , My version pytorch==1.11. Have you solved the problem ?
This issue can be resolved by using PyTorch 1.12 and later or setting:
torch.backends.cuda.matmul.allow_tf32 = False
TensorFloat-32(TF32) on Ampere devices Starting in PyTorch 1.7, there is a new flag called allow_tf32. This flag defaults to True in PyTorch 1.7 to PyTorch 1.11, and False in PyTorch 1.12 and later. This flag controls whether PyTorch is allowed to use the TensorFloat32 (TF32) tensor cores, available on new NVIDIA GPUs since Ampere, internally to compute matmul (matrix multiplies and batched matrix multiplies) and convolutions.
TF32 tensor cores are designed to achieve better performance on matmul and convolutions on torch.float32 tensors by rounding input data to have 10 bits of mantissa, and accumulating results with FP32 precision, maintaining FP32 dynamic range.
This issue can be resolved by using PyTorch 1.12 and later or setting:
torch.backends.cuda.matmul.allow_tf32 = False
TensorFloat-32(TF32) on Ampere devices Starting in PyTorch 1.7, there is a new flag called allow_tf32. This flag defaults to True in PyTorch 1.7 to PyTorch 1.11, and False in PyTorch 1.12 and later. This flag controls whether PyTorch is allowed to use the TensorFloat32 (TF32) tensor cores, available on new NVIDIA GPUs since Ampere, internally to compute matmul (matrix multiplies and batched matrix multiplies) and convolutions. TF32 tensor cores are designed to achieve better performance on matmul and convolutions on torch.float32 tensors by rounding input data to have 10 bits of mantissa, and accumulating results with FP32 precision, maintaining FP32 dynamic range.
It helps me a lot. When I install a different torch version, I cannot reproduce the author's results with their examples. The generated landmarks do not match with the input images. When adding this setting in cariface.py, the loss will be very small and the generated landmarks can match with the input images.
I try torch==1.7 and torch==1.9, and run test.sh as readme shows. No warning. But, the result is so bad:
Same as #32
However, when I switch to torch==1.4, the result is as good as paper shows.
I spend so much time on this problem but can not find why
It is so strange......