Training questions about the train part of main res.py

ThomasWestfechtel / SKD

Combining inherent knowledge of vision-language models with unsupervised domain adaptation through self-knowledge distillation

6 stars 0 forks source link

Training questions about the train part of main res.py #1

Open straw66 opened 10 months ago

straw66 commented 10 months ago

Hi author, I would like to ask why GPU utilization is high at the beginning of Training, but when print (Start Training!) After that, GPU utilization dropped dramatically and I changed all the num_workers in main_res to 16 and batch_size to 64 to see what might be wrong and how to fix it. The device I use is RTX3070. Screenshot of the initial runtime： 1fc412b38d471e4700094b9ab4656539 When print (Start Training!) Screenshot of the situation afterwards： e1314310922830676cb7b3cc011a6164

ThomasWestfechtel commented 9 months ago

My guess is that it is due to the reparametrization of the batchnorm layers of the ResNet backbone. To test this you could change the backbone to ViT-base, ViT doesn't utilize batchnorm layers, and therefore the spike in GPU utilization shouldn't happen for it.