Pretrained models on VGG-Sound, AudioSet and LibriSpeech

aleflabo commented 1 year ago

Hi authors,

I'm trying to reproduce the results reported in Table 3 of the paper. The checkpoints you are linking in the repo are already the fine-tuned models on the EpicSounds dataset.

The commands present in the README for fine-tuning, training from scratch and training the linear probe need the checkpoint pre-trained on [VGG-Sound] for ASF and [AudioSet, LibriSpeech] for SSAST. Am I missing something?

Thank you, Alessandro Flaborea

aleflabo commented 1 year ago

I have attached the runs I made using the commands you proposed in the README. As can be seen, the fine-tuning one achieves the paper's results immediately, whereas the others remain far from them.

fine tuning python tools/run_net.py \ --cfg configs/EPIC-Sounds/slowfast/SLOWFASTAUDIO_8x8_R50.yaml \ NUM_GPUS 2 \ OUTPUT_DIR /home/aleflabo/epic-kitchens/epic-sounds-annotations/src/output \ EPICSOUNDS.AUDIO_DATA_FILE /home/aleflabo/epic-kitchens/epic-sounds-data/EPIC_audio.hdf5 \ EPICSOUNDS.ANNOTATIONS_DIR /home/aleflabo/epic-kitchens/epic-sounds-annotations \ TRAIN.CHECKPOINT_FILE_PATH /home/aleflabo/epic-kitchens/epic-sounds-annotations/src/pretrained/SLOWFAST_EPIC_SOUNDS.pyth

from scratch python tools/run_net.py \ --cfg configs/EPIC-Sounds/slowfast/SLOWFASTAUDIO_8x8_R50.yaml \ NUM_GPUS 2 \ OUTPUT_DIR /home/aleflabo/epic-kitchens/epic-sounds-annotations/src/output \ EPICSOUNDS.AUDIO_DATA_FILE /home/aleflabo/epic-kitchens/epic-sounds-data/EPIC_audio.hdf5 \ EPICSOUNDS.ANNOTATIONS_DIR /home/aleflabo/epic-kitchens/epic-sounds-annotations

linear probe python tools/run_net.py \ --cfg configs/EPIC-Sounds/slowfast/SLOWFASTAUDIO_8x8_R50.yaml \ NUM_GPUS 2 \ OUTPUT_DIR /home/aleflabo/epic-kitchens/epic-sounds-annotations/src/output \ EPICSOUNDS.AUDIO_DATA_FILE /home/aleflabo/epic-kitchens/epic-sounds-data/EPIC_audio.hdf5 \ EPICSOUNDS.ANNOTATIONS_DIR /home/aleflabo/epic-kitchens/epic-sounds-annotations \ MODEL.FREEZE_BACKBONE True

JacobChalk commented 1 year ago

Hi,

Correct, the pretrained weights are already fine-tuned on EPIC-SOUNDS. To train from the initial pretrained models, you can download pretrained SlowFast models (including VGG) from here (file name is SLOWFAST_VGG.pyth on Dropbox). For SSAST you can download from here (file name SSAST-Base-Patch-400.pth on Dropbox). I will upload our versions to Dropbox for convenience, but in the meantime, this is where you can access them from.

The "from scratch" and "linear probe" runs won't be correctly reproduced without the pretrained models. Once you have access to the files, attaching TRAIN.CHECKPOINT_FILE_PATH <path-to-SLOWFAST_VGG.pyth> to your commands should fix it.

NOTE: The checkpoint loading in our code looks for a 'model_state' key in the checkpoint in order to properly load the weights. This is not present in the SSAST checkpoint from their GitHub (it's there in our version) so you will need to alter it slightly first with: ssast_ckpt = {'model_state': <loaded_ssast_checkpoint_file>}; torch.save(ssast_ckpt, <file-path>).

UPDATE: The ReadMe has now been updated with Dropbox links to the pretrained files that we used ourselves for SlowFast and SSAST

aleflabo commented 1 year ago

Thanks for your quick and accurate help! Just wanted to let you know that the links in the Readme seem to be broken at the moment.

JacobChalk commented 1 year ago

Thank you for making me aware, the links should now be fixed!

epic-kitchens / epic-sounds-annotations

Pretrained models on VGG-Sound, AudioSet and LibriSpeech #10