google-deepmind / kinetics-i3d

Convolutional neural network model for video classification trained on the Kinetics dataset.
Apache License 2.0
1.75k stars 464 forks source link

3D Inception v1 Model Train from Scratch on UCF101 Data Set #3

Open ahkarami opened 7 years ago

ahkarami commented 7 years ago

Hi Would you please tell me the accuracy of the 3D Inception v1 Model which is trained form the scratch on UCF101 Data Set? I mean that the model which isn't Inflated before via Kinetics pre-trained weights.

shuangshuangguo commented 7 years ago

@ahkarami Hi, I think you can get answer on table.2 in paper

ahkarami commented 7 years ago

@gss-ucas Thank you for your response. However, on Table 2 of paper mentioned that "All models are based on Imagenet pre-trained Inception-v1 ...". But I want to know the accuracy of training the 3D Inception V1 Model without any trick from scratch on UCF101 Data Set (It is important to know for validating the effect of Inflating Technique). In addition, the concept of Inflating has some ambiguity for me. In fact, I think it is vitally important to show the effect of Inflating Technique on 3D CNN models, and as this view point, I think the paper has some shortcomings.

shuangshuangguo commented 7 years ago

@ahkarami The author may not train from scratch on UCF101. As far as I am concerned, the intension of this architerture is leveraging successful ImageNet architecture designs and even their parameters, so we don't need to bother whether it's good when train from scratch, just use the existing Imagenet classification model. (of course, this is my humble opinion, please tell me if I'm wrong.

ahkarami commented 7 years ago

Dear @gss-ucas, Thank you very much for your time and helpful opinion. However, I think it was better that the authors report the accuracy of the 3D Inception V1 Model (without Inflating) on UCF101 Dataset. If they reported the mentioned accuracy, then one can easily figure out the effect of Inflating Technique. In addition, I think we can consider the Inflating technique just as a kind of weight initialization for 3D CNN models, and I doubt that this technique change the result of models significantly (maybe just accelerate the convergence rate, and maybe improve the accuracy of model a little). Moreover, the authors didn't release the code of Inflating Technique. It is worth noting that because the Kinetics data set is very large-scale, so if we train a 3D CNN model on it and then fine-tune it on a small-scale data set (e.g., UCF101) probably results a good accuracy (without using any trick such as inflating). But this method is completely differ from the Inflating Technique, and I think in the paper, these mentioned methods confused with each other.

baiyancheng20 commented 7 years ago

@gss-ucas @ahkarami Have you tried to inflate imagnet pretrained models into 3D models on CAFFE?

ahkarami commented 7 years ago

Dear @baiyancheng20, No I haven't try it on Caffe.

panna19951227 commented 6 years ago

have you implemented this paper on UCF101?Can you share about it.Thank you! I am confused when I am faced with fine-tuning the pre-trained model on UCF-101.

smittal6 commented 6 years ago

@panna19951227 Any update on your query?

joaoluiscarreira commented 6 years ago

Hi,

the numbers for two-stream I3D from scratch on UCF101 are 88.8%, vs 93.4% when starting from ImageNet. Check table 4 in the most recent version of the arxiv paper.