Hi author, I replaced the dataset with the smaller HMDB51 dataset, and after some experimentation, I found that 16 frames is less accurate than 8 frames:
16 fps: 70.33%
8 fps: 71.96%
We think the result is incredible!!
The backbone we used is VIT-B/16
Due to our limited experimental environment, we train ILA on one 3090 GPU and we ues the following parameters:
lr = 8e-6, batchsize = 4
We debugged the learning rate lr, and we found that ILA seems to be very sensitive to the learning rate, and when increasing the learning rate by a factor of 10, the accuracy is only about 15%.
Hi author, I replaced the dataset with the smaller HMDB51 dataset, and after some experimentation, I found that 16 frames is less accurate than 8 frames: 16 fps: 70.33% 8 fps: 71.96% We think the result is incredible!! The backbone we used is VIT-B/16
Due to our limited experimental environment, we train ILA on one 3090 GPU and we ues the following parameters: lr = 8e-6, batchsize = 4
We debugged the learning rate lr, and we found that ILA seems to be very sensitive to the learning rate, and when increasing the learning rate by a factor of 10, the accuracy is only about 15%.