Open hubaak opened 2 weeks ago
Thank you for the question.
Here are our settings: batch size=64, lr=1e-4,scheduler = step_LR, step=40, decay_ratio=0.1, optimizer = sgd, weiht_decay = 1e-4
We use imagenet pre-trained ResNet18 as backbone. For RGB modality, we evenly pick 3 frames for each sample. For optical flow modality, we stack the horizontal vector u and vertical vector v in the way of [u,v,u] to form three channels as one frame and select 3 frames in total.
Thank you for the question.
Here are our settings: batch size=64, lr=1e-4,scheduler = step_LR, step=40, decay_ratio=0.1, optimizer = sgd, weiht_decay = 1e-4
We use imagenet pre-trained ResNet18 as backbone. For RGB modality, we evenly pick 3 frames for each sample. For optical flow modality, we stack the horizontal vector u and vertical vector v in the way of [u,v,u] to form three channels as one frame and select 3 frames in total.
Thanks a lot for providing your settings! I'll try this again with the setting.
In the paper, an image-net pretrained resnet18 model can achieve a score of 77.2 with only RGB modality. However, there is no code for UCF101 in the repo. I tried to train a resnet18 according to the settings in the paper and its accuracy is 0.43 with a setting of (batch_size, lr, epoch) = (32, 1E-3, 800). So I'm confused by such a performance gap. Can you provide some implementation details or the code for UCF101? BTW, 3D resnet18 with a lot of tricks has a score of 74.1 in https://arxiv.org/pdf/2103.05905v2, so I think it's a little bit wield a resnet18 with only RGB modality to achieve a performance that easily.