Closed ssp789 closed 3 months ago
Hi,
The context files are provided alongside the ground truth files here. These will account for you train_context
and val_context
pickle files for both the visual and audio modality.
For the visual_input_dim
and audio_input_dim
, it depends on what features you have extracted. Our audio features were auditory slowfast and hence audio_input_dim=2304
. If you just extracted Omnivore, or VideoMAE, then visual_input_dim=1024
. If you merged them along the channel dimension as we did, then visual_input_dim=2048
.
You could find the context_files in here.
Feature dimension depends on what visual / audio features you are using. If you use Omnivore + ASlowFast, visual_input_dim
should be 1024 and audio_input_dim
should be 2304.
你好,
上下文文件与基本事实文件一起提供于此处。这些将为您解释
train_context
视觉val_context
和音频模态的 pickle 文件。对于
visual_input_dim
和audio_input_dim
,这取决于你提取了哪些特征。我们的音频特征是听觉最慢的,因此audio_input_dim=2304
。如果你只是提取了 Omnivore 或 VideoMAE,那么visual_input_dim=1024
。如果你像我们一样沿通道维度合并它们,那么visual_input_dim=2048
。
Thank you for your reply.
Hello, is there a mismatch between the EPIC training script you provided and the dataset?
python scripts/run_net.py \ --train \ --output_dir /path/to/output \ --video_data_path /path/to/epic_visual_features \ --video_train_action_pickle /path/to/epic_100_train_annotations \ --video_val_action_pickle /path/to/epic_100_validation_annotations \ --video_train_context_pickle /path/to/epic_100_train_visual_feature_intervals \ --video_val_context_pickle /path/to/epic_100_validation_visual_feature_intervals \ --visual_input_dim \
--audio_data_path /path/to/epic_audio_features \
--audio_train_action_pickle /path/to/epic_sounds_train_annotations \
--audio_val_action_pickle /path/to/epic_sounds_validation_annotations \
--audio_train_context_pickle /path/to/epic_sounds_train_audio_feature_intervals \
--audio_val_context_pickle /path/to/epic_sounds_validation_audio_feature_intervals \
--audio_input_dim \
--video_info_pickle /path/to/epic_kitchens_video_metadata \
--lambda_audio 0.01
--video_train_context_pickle /path/to/epic_100_train_visual_feature_intervals \ --video_val_context_pickle /path/to/epic_100_validation_visual_feature_intervals \ --visual_input_dim \
and
--audio_train_context_pickle /path/to/epic_sounds_train_audio_feature_intervals \
--audio_val_context_pickle /path/to/epic_sounds_validation_audio_feature_intervals \
--audio_input_dim \
--video_info_pickle /path/to/epic_kitchens_video_metadata \
Is it not provided? Or which file in the dataset should be provided?
Thank you for your reply.