Thanks for the great work. I have tried to train MViTv2 model on IN-1K but I got ~0.5 point Top-1 less than reported. I think the differences between my training procedure compared to the paper are just a smaller batch size (I used 1024 instead of 2048) and enabling mixed-precision training. Do you think these would cause the performance difference? Thanks!
Hi,
Thanks for the great work. I have tried to train MViTv2 model on IN-1K but I got ~0.5 point Top-1 less than reported. I think the differences between my training procedure compared to the paper are just a smaller batch size (I used 1024 instead of 2048) and enabling mixed-precision training. Do you think these would cause the performance difference? Thanks!
Best, Junwei