XinhaoMei / WavCaps

This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
196 stars 11 forks source link

To reproduce zero-shot audio classification result #17

Closed Ming-er closed 10 months ago

Ming-er commented 1 year ago

Dear author, I want to pretrain from scratch and reproduce the zero-shot audio classification result. Should I use the 'blacklist_exclude_ub8k_esc50_vggsound.json' as the blacklist file, and use the 'retrieval/settings/pretrain.yaml' as the configuration?

XinhaoMei commented 1 year ago

Yes. In the blacklist file, the id of audio clips from AudioSet is stored with ".wav", do necessary modification if you data format is flac. In addition, the script is only tested for single-card training, set bucket to false in pretrain_dataloader if you want to use multi-card training.

Ming-er commented 1 year ago

Thank you for your advice, I will try it! If I have any more problems, I will ask you again!

Ming-er commented 1 year ago

Dear author, sorry to bother you again. In Line 39-45 of file "retrieval/data_handling/pretrain_dataset.py", it requires the keys "duration" and "id" in the train.json file of Clotho dataset. However, I cannot find them in the provided json file. What should I do?

XinhaoMei commented 1 year ago

Hi, 'id' is the id of the waveform in FreeSound which will be used for blacklist, and the 'duration' is the length of the audio in second. I have updated new json files of Clotho that contain these two keys.

Ming-er commented 1 year ago

It really helps, thanks !