apple / ml-cvnets

CVNets: A library for training computer vision networks
https://apple.github.io/ml-cvnets
Other
1.76k stars 225 forks source link

'nan' loss when training 'ByteFormer' using ImageNet #103

Open minguinho26 opened 8 months ago

minguinho26 commented 8 months ago

Hi,

I observed the 'nan' loss when using 'RTX A6000 ada' as gpu and attempting to train the ByteFormer by using the config file 'examples/byteformer/imagenet_file_encodings/encoding_type=TIFF.yaml'.

There were still observed nan loss when changing the gpu device to 'RTX 4090'.

I wonder whether you didn't see the 'nan' loss when training the ByteFormer using ImageNet as training set.

The used modules for training are like these.

cvnets 0.3 torch 1.13.1+cu117 torchaudio 0.13.1+cu117 torchtext 0.14.1 torchvision 0.14.1+cu117