cypw / PyTorch-MFNet

MIT License
253 stars 56 forks source link

Some Questions about the Training Process #17

Open VectorYoung opened 5 years ago

VectorYoung commented 5 years ago

Hi Yunpeng, I am new to video recognition tasks. I ran the code and have some questions about the whole procedure.

  1. For training, do you randomly sample 16 frames from the whole video to do classification? And each time it may be different 16 frames for the same video?

  2. When I was trying to run the codes train_hmdb51, there are many logs like 'frame[30] is error, use backup item XXX.avi'. What does this mean? Does this mean that there are some errors in my video data?(I downloaded it from the official website)

  3. It seems that the train_hmdb51 is doing both training and evaluation after each epoch. So why do we need another evaluation code like evaluate_video.py to do test?

Thanks a lot for your help!

cypw commented 5 years ago

Hi @VectorYoung ,

  1. Yes, it uses random sampling.

  2. It means the data loader cannot correct extract "frame 30" from that ".avi" file. It is either caused by the corrupted video file or simple because current version of the data-loader cannot well handle that particular video file. If such error raised, the data loader will try to load a backup video as current training sample, so that the program can keep going. The backup video is randomly selected from previous succeeded sampled video clip. Regarding the HMDB51 dataset, I personally first convert the whole dataset with "ffmpeg -c:v mpeg4" (keep original resolution) and this procedure can somehow help the data-loader successfully load all videos without any warning/error.

  3. The testing/evaluation strategy is different. During the training, the accuracy is corresponding to the clip level prediction, where the program randomly sample a short clip and make a prediction for that clip. The clip-level prediction is then treated as the prediction for the entire video. However, "evaluate_video.py" sample multiple clips, average their results and use the aggregated results as the prediction for the entire video, thus is much more accuracy. But, the better result comes with a very high computational cost and it not affordable during training in my case.

Thanks for trying our code and sorry for the late reply.

VectorYoung commented 5 years ago

Hi @cypw , Thanks a lot for your reply. I am trying to train on Kinetics 400 and I found that reading from the original .mp4 videos is very slow. I found that you have a script to convert it to .avi and I try it. But some videos failed. And even more videos are converted to .avi but have no frames or frame[0] is error(I see from the training log). Do you encounter the same issue? I am trying to find how to properly process and read the data. Thanks a lot for your help.