Open straw66 opened 10 months ago
My guess is that it is due to the reparametrization of the batchnorm layers of the ResNet backbone. To test this you could change the backbone to ViT-base, ViT doesn't utilize batchnorm layers, and therefore the spike in GPU utilization shouldn't happen for it.
Hi author, I would like to ask why GPU utilization is high at the beginning of Training, but when print (Start Training!) After that, GPU utilization dropped dramatically and I changed all the num_workers in main_res to 16 and batch_size to 64 to see what might be wrong and how to fix it. The device I use is RTX3070. Screenshot of the initial runtime: When print (Start Training!) Screenshot of the situation afterwards: