PaddlePaddle / PaddleVideo

Awesome video understanding toolkits based on PaddlePaddle. It supports video data annotation tools, lightweight RGB and skeleton based action recognition model, practical applications for video tagging and sport action detection.
Apache License 2.0
1.53k stars 379 forks source link

足球时序定位的项目,里面没有用音频特征而是PCM? #620

Open Hubert2102 opened 1 year ago

Hubert2102 commented 1 year ago

根据文档一步一步走下来,在提取特征的时候,保存pkl存了3种数据,分别是 video_features = { 'image_feature': np_image_features, 'audio_feature': np_audio_features, 'pcm_feature': np_pcm_features }

但是在get_instance_for_bmn.py 里面,并没有用audio_feature feature_video = np.concatenate((image_feature, pcm_feature), axis=1) 而且train_proposal/configs/bmn_football_v2.0.yaml 里 feat_dim: 2688 #train bmn with image feature. If add audio feature, set to 2688 确实用的就是640维的pcm

请问audio_feature在哪用的呢?还是说直接用PCM就可以了?

westfish commented 8 months ago

在你的这个case中,训练过程貌似只考虑了图像和PCM特征。 如果您想要使用音频特征,可以尝试修改数据加载器代码以包含音频特征,更新您的模型输入以接受这个额外的音频特征维度,调整 feat_dim 配置以反映新的特征维度总和。