I observed the 'nan' loss when using 'RTX A6000 ada' as gpu and attempting to train the ByteFormer by using the config file 'examples/byteformer/imagenet_file_encodings/encoding_type=TIFF.yaml'.
There were still observed nan loss when changing the gpu device to 'RTX 4090'.
I wonder whether you didn't see the 'nan' loss when training the ByteFormer using ImageNet as training set.
Hi,
I observed the 'nan' loss when using 'RTX A6000 ada' as gpu and attempting to train the ByteFormer by using the config file 'examples/byteformer/imagenet_file_encodings/encoding_type=TIFF.yaml'.
There were still observed nan loss when changing the gpu device to 'RTX 4090'.
I wonder whether you didn't see the 'nan' loss when training the ByteFormer using ImageNet as training set.
The used modules for training are like these.
cvnets 0.3 torch 1.13.1+cu117 torchaudio 0.13.1+cu117 torchtext 0.14.1 torchvision 0.14.1+cu117