Tushar-N / pytorch-resnet3d

I3D Nonlocal ResNets in Pytorch
245 stars 39 forks source link

Accuracy only 53.2%! #3

Closed HaiyiMei closed 5 years ago

HaiyiMei commented 5 years ago

Hi there, great job! I've got a problem here. I followed your instructions exactly how they were, but I just got a result that accuracy is 53.2%. Do you have any idea why it is so? Thanks!

HaiyiMei commented 5 years ago

By the way, it is the clip mode that got this accuracy.

Tushar-N commented 5 years ago

Are you running exactly the same pytorch/python versions? (1.0 and 3.7)?

HaiyiMei commented 5 years ago

Are you running exactly the same pytorch/python versions? (1.0 and 3.7)?

I'm running in pytorch/python (1.1.0 and 3.6.8). I'll test in pytorch1.0 later. Thanks!

HaiyiMei commented 5 years ago

Are you running exactly the same pytorch/python versions? (1.0 and 3.7)?

I've switched to torch1.0.0 / torchvision0.2.2 / python3.7.3. Here's the result:

(test) A: 0.532 | clf: 2.435 | total_loss: 2.435

Still not work.

Tushar-N commented 5 years ago

Did all the kinetics videos get downloaded and frames extracted properly? I have ~18434 videos in the validation set for which I reported the numbers. Also, for reference, here's a link to my exact model weights after conversion (in case something went wrong). Could you run eval.py with this file, just in case the conversion went wrong?

HaiyiMei commented 5 years ago

I've got 18310 videos in validation set, and with your weights I only got 50%( torch1.0.0 / torchvision0.2.2 / python3.7.3 / CUDA 9.0). It is driving me crazy, I can't find anything wrong.

I'm thinking if it is the preprocessing part that makes the difference? Have you ever modified the dataloader or transforms?

Plus, every time I run the code in video mode, the program will be killed before the first forward step for the network. And I got this error below. Why is this happening? `RuntimeError: DataLoader worker (pid 18644) is killed by signal: Bus error.

Tushar-N commented 5 years ago

I have exactly the same environment as you, I'm really not sure what's going on. I'll get some other people to test this and report if they get similar mismatches. Until then, my only other guess is that the videos were not burst to frames in the same way? I used the command from the official activitynet crawler (link) to do this (ffmpeg 4.0.2).

Plus, every time I run the code in video mode, the program will be killed before the first forward step for the network.

Is this just a memory error? With a batch size of 8, this takes up ~8GB GPU memory.

Finally, another recent repo, mmaction, provides pytorch kinetics pretrained I3D models, so you may want to take a look at that too.

HaiyiMei commented 5 years ago

Hey! I figured it out! It is because of the function glob.glob(). After glob function, the frames should be sorted like: frames.sort(). Otherwise, the frames would be shuffled(or in system file binary order or something), not in time sorted.

After adding the sort step, my valid in r50_nl model came to P1: 0.660 | P5: 0.858 in clip mode. Thank you so much! Thanks for your great job and patient answering!

Tushar-N commented 5 years ago

Great catch! This completely slipped past me since I got lucky, and the files happened to be sorted when I ran it. Thanks for pointing it out, and sorry for the headache it caused!