WingsBrokenAngel / Semantics-AssistedVideoCaptioning

Source code for Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling Strategy
MIT License
56 stars 17 forks source link

关于ResNeXt101和ECO提取特征的疑问以及用于训练语义检测网络的两个npy文件 #6

Closed AndyMjw closed 4 years ago

AndyMjw commented 4 years ago

首先非常感谢作者,您的工作和开源代码对我很有帮助。在学习过程中仍然有些疑问想请教您: 1、在Semantics-AssistedVideoCaptioning-master/tagging文件夹下的train_tag_net.py中,命令行参数有 (1)msvd_resnext_eco.npy:msvd数据集的视频特征-->1970x3584 (2)msvd_tag_gt_4_msvd.npy:从msvd选取300个词对msvd做标注,得到的1970x300的真实语义标注 (3)msrvtt_resnext_eco.npy:msrvtt数据集的视频特征-->10000x3584 (4)msrvtt_tag_gt_4_msrvtt.npy:从msrvtt选取300个词对msrvtt做标注,得到的10000x300的真实语义标注 不知道我上面的理解正确吗? 另外,在Data-->MSVD部分,除了提供了msvd_tag_gt_4_msvd.npy,还有一个msrvtt_tag_gt_4_msvd.npy(如下图),我看它的shape是10000x300,请问这个文件是用msvd的300个词对msrvtt做的真实语义标注吗? 下面还有一句"The previous two files are used to train the tagging network."是想说用这两个文件针对msvd数据集做一个语义检测网络吗?但是我看train_tag_net.py中,是用msvd_tag_gt_4_msvd.npy和msrvtt_tag_gt_4_msrvtt.npy训练了一个统一的语义检测网络啊,这是怎么回事? image

2、在提取eco特征时,用到了caffemodel,想知道net = caffe.Net(model_file, model_def_file, caffe.TEST),model_def_file是您提供的ECO_full_kinetics.caffemodel,那model_file呢,我看网上说是一个deploy.prototxt文件,但您提供的所有文件里并没有这一项,由于我对caffe那一套不了解,不知道具体是怎么回事,还请指教。

3、我注意到,产生resnext特征的文件,generate_res_feat.py,最后产生的是一个1970x32x2048的张量(针对msvd),并把张量写入一个npy文件,而在您的文章里,是把它们按空间维度(即32所在的维度)进行了平均池化,最终得到1970x2048的特征。 平均池化操作是使用tf.layers.average_pooling3d进行的吗?

4、我看文章中,视频特征是把Ei(第i视频的动态特征)堆叠到Ri(第i视频的静态特征)上去,得到3584维特征。我想知道在您的代码处理过程中,每个视频的3584维特征,具体是eco+resnext(1536+2048),还是resnext+eco(2048+1536)呢? 5、不好意思,在github上用中文提issues,因为问的东西比较多,还望您能指点一二,谢谢。

WingsBrokenAngel commented 4 years ago
  1. (1)-(4) is right. msrvtt_tag_gt_4_msvd.npy is generated by using 300 words from the MSVD dataset to label the MSR-VTT dataset. msvd_tag_gt_4_msvd.npy and msrvtt_tag_gt_4_msvd.npy are used to train a semantic detection network for MSVD dataset.msvd_tag_gt_4_msrvtt.npy and msrvtt_tag_gt_4_msrvtt.npy are used to train a semantic detection network for MSR-VTT dataset.

  2. deploy.prototxt is provided in the original repository (There is a hyper-link in README).

  3. The average pooling operation is performed along the time axis.

  4. The file that is eco+resnext is named as eco_res and the one that is resnext+eco is named as res_eco.

AndyMjw commented 4 years ago

very useful,thank you.

AndyMjw commented 4 years ago

sorry,I can not find “deploy.prototxt” link in READMe,only ECO_full_kinetics.caffemodel link there,can you provide one,thank you.

WingsBrokenAngel commented 4 years ago

You can find it in the ECO's source code: https://github.com/mzolfaghari/ECO-efficient-video-understanding .