关于ResNeXt101和ECO提取特征的疑问以及用于训练语义检测网络的两个npy文件

AndyMjw commented 4 years ago

首先非常感谢作者，您的工作和开源代码对我很有帮助。在学习过程中仍然有些疑问想请教您： 1、在Semantics-AssistedVideoCaptioning-master/tagging文件夹下的train_tag_net.py中，命令行参数有 (1)msvd_resnext_eco.npy:msvd数据集的视频特征-->1970x3584 (2)msvd_tag_gt_4_msvd.npy:从msvd选取300个词对msvd做标注，得到的1970x300的真实语义标注 (3)msrvtt_resnext_eco.npy:msrvtt数据集的视频特征-->10000x3584 (4)msrvtt_tag_gt_4_msrvtt.npy：从msrvtt选取300个词对msrvtt做标注，得到的10000x300的真实语义标注不知道我上面的理解正确吗？另外，在Data-->MSVD部分，除了提供了msvd_tag_gt_4_msvd.npy，还有一个msrvtt_tag_gt_4_msvd.npy(如下图)，我看它的shape是10000x300，请问这个文件是用msvd的300个词对msrvtt做的真实语义标注吗？下面还有一句"The previous two files are used to train the tagging network."是想说用这两个文件针对msvd数据集做一个语义检测网络吗？但是我看train_tag_net.py中，是用msvd_tag_gt_4_msvd.npy和msrvtt_tag_gt_4_msrvtt.npy训练了一个统一的语义检测网络啊，这是怎么回事？

2、在提取eco特征时，用到了caffemodel，想知道net = caffe.Net(model_file, model_def_file, caffe.TEST)，model_def_file是您提供的ECO_full_kinetics.caffemodel，那model_file呢，我看网上说是一个deploy.prototxt文件，但您提供的所有文件里并没有这一项，由于我对caffe那一套不了解，不知道具体是怎么回事，还请指教。

3、我注意到，产生resnext特征的文件，generate_res_feat.py，最后产生的是一个1970x32x2048的张量(针对msvd)，并把张量写入一个npy文件，而在您的文章里，是把它们按空间维度(即32所在的维度)进行了平均池化，最终得到1970x2048的特征。平均池化操作是使用tf.layers.average_pooling3d进行的吗？

4、我看文章中，视频特征是把Ei(第i视频的动态特征)堆叠到Ri(第i视频的静态特征)上去，得到3584维特征。我想知道在您的代码处理过程中，每个视频的3584维特征，具体是eco+resnext(1536+2048)，还是resnext+eco(2048+1536)呢？ 5、不好意思，在github上用中文提issues，因为问的东西比较多，还望您能指点一二，谢谢。

WingsBrokenAngel commented 4 years ago

(1)-(4) is right. msrvtt_tag_gt_4_msvd.npy is generated by using 300 words from the MSVD dataset to label the MSR-VTT dataset. msvd_tag_gt_4_msvd.npy and msrvtt_tag_gt_4_msvd.npy are used to train a semantic detection network for MSVD dataset.msvd_tag_gt_4_msrvtt.npy and msrvtt_tag_gt_4_msrvtt.npy are used to train a semantic detection network for MSR-VTT dataset.
deploy.prototxt is provided in the original repository (There is a hyper-link in README).
The average pooling operation is performed along the time axis.
The file that is eco+resnext is named as eco_res and the one that is resnext+eco is named as res_eco.

AndyMjw commented 4 years ago

very useful，thank you.

AndyMjw commented 4 years ago

sorry,I can not find “deploy.prototxt” link in READMe,only ECO_full_kinetics.caffemodel link there,can you provide one,thank you.

WingsBrokenAngel commented 4 years ago

You can find it in the ECO's source code: https://github.com/mzolfaghari/ECO-efficient-video-understanding .

WingsBrokenAngel / Semantics-AssistedVideoCaptioning

关于ResNeXt101和ECO提取特征的疑问以及用于训练语义检测网络的两个npy文件 #6