Hi, I am running experiments with ConvNext-Base and ViT-B on imagenet 1k, which the current paper says have about the same throughput/ sec and ~num of parameters. However, my training speed is about 2x slower for ConvNext-B. I am using ConvNext-B code from this repo, and my own Pytorch Lightning dataloading pipeline for both ConvNext-B and ViT-B.
I use AMP, 224x224 for both, and same hparams for pretty much everything else (I am using most of the original paper's hparams for both). For ViT I used batch size 128, and ConvNext I used 256, with scaled learning rates to match the papers. I am using 8 A100s, and ConvNext is about 15.5 mins/epoch, while ViT about 8 mins/epoch. I'm trying to think of any reason for the discrepancy.
Some other things I tried:
I also tried cudnn.benchmark = True but it didn't change anything.
I tried just a single gpu (T4 local), and the exact same batch sizes for both models, and I get a 1.7x slower speed difference for ConvNext.
Did you measure the inference speed, and does it match with the paper (relative speed)? The training speed comparison could be different. It also depends on the hardware and versions used.
Hi, I am running experiments with ConvNext-Base and ViT-B on imagenet 1k, which the current paper says have about the same throughput/ sec and ~num of parameters. However, my training speed is about 2x slower for ConvNext-B. I am using ConvNext-B code from this repo, and my own Pytorch Lightning dataloading pipeline for both ConvNext-B and ViT-B.
I use AMP, 224x224 for both, and same hparams for pretty much everything else (I am using most of the original paper's hparams for both). For ViT I used batch size 128, and ConvNext I used 256, with scaled learning rates to match the papers. I am using 8 A100s, and ConvNext is about 15.5 mins/epoch, while ViT about 8 mins/epoch. I'm trying to think of any reason for the discrepancy.
Some other things I tried:
I also tried
cudnn.benchmark = True
but it didn't change anything.I tried just a single gpu (T4 local), and the exact same batch sizes for both models, and I get a 1.7x slower speed difference for ConvNext.
Did anybody else experience something similar?
Thanks!