GeWu-Lab / OGM-GE_CVPR2022

The repo for "Balanced Multimodal Learning via On-the-fly Gradient Modulation", CVPR 2022 (ORAL)
MIT License
236 stars 18 forks source link

Processing method of VGGSound and Kinetics-Sounds data #27

Open HPU-Yz opened 1 year ago

HPU-Yz commented 1 year ago

Hello, can you give the processing method of VGGSound and Kinetics-Sounds data. For the VGGSound data, what I downloaded from the link you gave is a vggsound.csv file, which cannot correspond to the data set in your mp4_to_wav.py processing. The same is true for the Kinetics-Sounds dataset, the downloaded dataset is not the same as the dataset processed in dataset.py. Can you share your data set or specific processing method steps?

I am interested in your method. I hope that you reply.

DTaoo commented 1 year ago

Hi, thanks for your interest.

For more details about the dataset, you can view this issue: https://github.com/GeWu-Lab/OGM-GE_CVPR2022/issues/14

Best, Di

echo0409 commented 1 year ago

Hi, Thanks for your interest.

the pre-process of VGGSound and KS dataset can refer: https://github.com/GeWu-Lab/OGM-GE_CVPR2022/blob/main/data/VGGSound/video_preprocessing.py https://github.com/hche11/VGGSound/blob/master/preprocess_audio.py