First I want to say thank you for releasing the code for this excellent work. I have a couple of observations/questions that hopefully you can respond to.
It appears that the reported 645 M Flops is reported on a per frame basis which would imply that for the 243 frame model the flops for a forward paass would be 645*243 M FLOPS is this correct?
I am also witnessing that to train a T=27 model on 2 3090 RTXs (48 GB of VRAM) I can only use a batch size of 200 (this is the max before Memory errors)). Can you please tell us what gpus you trained on to be able to support a batch size of 1024?
I am wondering if the small increases in performance wrt to accuracy are outweighed by the extreme computational overhead which seems to be making training almost intractable for the majority.
First I want to say thank you for releasing the code for this excellent work. I have a couple of observations/questions that hopefully you can respond to.
It appears that the reported 645 M Flops is reported on a per frame basis which would imply that for the 243 frame model the flops for a forward paass would be 645*243 M FLOPS is this correct?
I am also witnessing that to train a T=27 model on 2 3090 RTXs (48 GB of VRAM) I can only use a batch size of 200 (this is the max before Memory errors)). Can you please tell us what gpus you trained on to be able to support a batch size of 1024?
I am wondering if the small increases in performance wrt to accuracy are outweighed by the extreme computational overhead which seems to be making training almost intractable for the majority.