XinhaoMei / WavCaps

This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
205 stars 12 forks source link

Question about pretrain #16

Closed marmoi closed 1 year ago

marmoi commented 1 year ago

Hi, thank you for publishing wavcaps! it is really useful. I am trying to reproduce the results using HTSAT-BART using the code pretrain.py. But I got the following errors:

I am doing something wrong? Thank you.

XinhaoMei commented 1 year ago

Hi, thanks for your interests!

First, the json_files should include the path to the json files of the data you want to use. Yes, they should be the ones we provided in the huggingface repo. Second, you could add the "duration" key for these two datasets whose value shoud be the length of the audio clip (in second). Because we will group audio clips with similar duration in a batch, this key will be used. However, this is only applicable to single-card training. If you want to use multi-card training, please set bucket=False when use the pretrain_dataloader.

I hope this is helpful!

marmoi commented 1 year ago

Hi, thank you for your reply! it was very helpful. I have another question regarding the results from table VI from the paper. For the dataset Clotho using HTSAT-BART (baseline) the model has been trained only with Clotho? or AudioCaps was used also for pretraining it and then finetuned with Clotho?

Thank you!

XinhaoMei commented 1 year ago

Hi, thank you for your reply! it was very helpful. I have another question regarding the results from table VI from the paper. For the dataset Clotho using HTSAT-BART (baseline) the model has been trained only with Clotho? or AudioCaps was used also for pretraining it and then finetuned with Clotho?

Thank you!

Hi, for baseline models, we trained them only on Clotho or AudioCaps, pretraining was not used.