High version Pytorch might lead to totally wrong result

NaNillll commented 2 years ago

I try torch==1.7 and torch==1.9, and run test.sh as readme shows. No warning. But, the result is so bad:

微信图片_20220210202036

Same as #32

However, when I switch to torch==1.4, the result is as good as paper shows.

I spend so much time on this problem but can not find why

It is so strange......

RainbowRui commented 2 years ago

Thanks for your analysis. We also need to study the reasons carefully.

ForrestPi commented 2 years ago

@NaNillll @RainbowRui @Juyong yeah the same problem ， My version pytorch==1.11. Have you solved the problem ?

beanrac commented 1 year ago

This issue can be resolved by using PyTorch 1.12 and later or setting:

torch.backends.cuda.matmul.allow_tf32 = False

TensorFloat-32(TF32) on Ampere devices Starting in PyTorch 1.7, there is a new flag called allow_tf32. This flag defaults to True in PyTorch 1.7 to PyTorch 1.11, and False in PyTorch 1.12 and later. This flag controls whether PyTorch is allowed to use the TensorFloat32 (TF32) tensor cores, available on new NVIDIA GPUs since Ampere, internally to compute matmul (matrix multiplies and batched matrix multiplies) and convolutions.

TF32 tensor cores are designed to achieve better performance on matmul and convolutions on torch.float32 tensors by rounding input data to have 10 bits of mantissa, and accumulating results with FP32 precision, maintaining FP32 dynamic range.

Vincent-ZHQ commented 11 months ago

This issue can be resolved by using PyTorch 1.12 and later or setting:

torch.backends.cuda.matmul.allow_tf32 = False

TensorFloat-32(TF32) on Ampere devices Starting in PyTorch 1.7, there is a new flag called allow_tf32. This flag defaults to True in PyTorch 1.7 to PyTorch 1.11, and False in PyTorch 1.12 and later. This flag controls whether PyTorch is allowed to use the TensorFloat32 (TF32) tensor cores, available on new NVIDIA GPUs since Ampere, internally to compute matmul (matrix multiplies and batched matrix multiplies) and convolutions. TF32 tensor cores are designed to achieve better performance on matmul and convolutions on torch.float32 tensors by rounding input data to have 10 bits of mantissa, and accumulating results with FP32 precision, maintaining FP32 dynamic range.

It helps me a lot. When I install a different torch version, I cannot reproduce the author's results with their examples. The generated landmarks do not match with the input images. When adding this setting in cariface.py, the loss will be very small and the generated landmarks can match with the input images.

Juyong / CaricatureFace

High version Pytorch might lead to totally wrong result #35