Closed patykov closed 5 years ago
Hey there! Have you figured it out why it's happening? I'm facing the same questions.
Hi! Actually I did, it was a combination of factors really. First, I forgot to activate the use of the BN mean/var during evaluation (my bad!). Second, since we are copying the pre-trained weights the non-local block operations must be exactly the same as the ones used by the authors, and the common pythorch implementation of the non-local block we find online differs a little from it. You should add a 'scale' operation in it: https://github.com/facebookresearch/video-nonlocal-net/blob/c253ed5eb004fa2cae0490ed46f5018a3c3b060f/lib/models/nonlocal_helper.py#L86 As they followed section 3.2.1 https://arxiv.org/pdf/1706.03762.pdf . So in my foward function of the non-local block I add:
...
f = torch.matmul(theta_x, phi_x)
f_sc = f * (self.inter_channels**-.5) # https://arxiv.org/pdf/1706.03762.pdf section 3.2.1
f_div_C = F.softmax(f_sc, dim=-1)
...
And finally, you should change the order of the maxpool layer to before the operations phi and g.
self.g = nn.Sequential(max_pool_layer, self.g)
self.phi = nn.Sequential(max_pool_layer, self.phi)
This modifications solved the problem for me, hope it helps!
Awesome! This definitely helps me a lot! Appreciated! Plus, I saw you stared this repo. This is the baseline code you use, right? I run the kinetics validation dataset through this code and weights. And I got an accuracy that is about 10% lower than it reported. (I've tried several other repos of pytorch i3d implement, none of them are working. It's driving me crazy😭) Have you run the validation set on this code? Or is there something I need to pay attention to in video2jpg stuff, data preprocessing(torchvison.transforms) or something else? Thanks a lot!
Hey! I figured it out! @patykov
It is because of the function glob.glob(). After the glob function, the frames should be sorted like: frames.sort()
. Otherwise, the frames would be shuffled(or in system file binary order or something), not in time sorted.
@HaiyiMei @patykov I have seen your talk. would you mind if you release your implementation of pytorch code for 2D-Resnet and 3D-resnet baseline ?
@HaiyiMei Thanks for your reply, and I will check it~
@HaiyiMei Hi, how do you transfer 2d-resnet50 weights into pytorch? Have you test 2d resnet50 model using pytorch?
@AlexHu123 Sorry, I didn't test 2d resnet50 on kinetics. I just used the pretrained weight provided by torchvision before on image classification.
Hi! I'm working in a pytorch implementation of your work but my results are far behind the expected. I've created a pytorch version of the i3d_nonlocal model and copied the pre-trained weights of the file "i3d_nonlocal_32x2_IN_pretrain_400k.pkl" to it. I've double checked all the layers and they seen to be equal. I'm using the Kinetic dataset you've provided and I'm applying normalization (mean, std). However, the results I'm getting for the validation set in the fully convolutional eval are 48%/75% (top1/top5).
I was wondering if anyone could have an idea of what I'm doing wrong. Maybe copying the weights to pytorch isn't as simple as relating:
and so on? I've checked the RGB/BGR input difference between the frameworks, but it didn't help.