Closed nayyeraafaq closed 5 years ago
Yes, the output of R(2+1)D-18 and -34 model provides feature at pool-5 is 512-dim, I plan to release soon a R(2+1)D-50 with 2048-dim feature output.
Wow. This sounds good. Even I would be looking forward to the R(2+1)D-50 with 2048 feature output.
Hi @dutran,
I performed a evaluate using C3D Features and R2Plus1D in a Pornography Dataset, In my experiments, C3D+SVM achieved the best results (95.1%), while R2Plus1D+Softmax achieved only 91.8 %.
I tried to fuse predictions for RGB and Optical Flow in the R2Plus1D but for this problem do not increase the accuracy.
Now I'm working in R2Plus1D as features extractor and notice that these features in the Pornography problem with an SVM classifier achieved lower accuracy (93.13%) than obtained in C3D.
Is this dataset the state-of-the-art method achieved 97.9% of accuracy using Two-Stream CNN (Static RGB and Optical Flow), I'm frustrated by the fact that 3D CNN does not can obtain best results than methods that use only still images.
Now I'm extracting features using R2Plus1D in Optical Flow videos and will try to fuse these features with RGB and perform an SVM classifier.
I would like experiment R(2+1)D-50 with 2048 features, because if with only 512 features I obtained the same result as C3D features, probably with a features vector bigger the accuracy may increase.
Any updates on this ?
new, deeper models are released, feel free to give it a try, thanks!
Hi everyone, The paper says that dimensions of fc layer is 400 for kinetics and 512 for pooling. Is that the right ? Does any model offers 4096 dimensions feature vector extraction ? Can we perform feature extraction of own videos on this network for other applications use? Thanks