XinhaoMei / WavCaps

This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
196 stars 11 forks source link

about Pretraining Model on wavcaps for audio captioning #24

Closed hello-xiaow closed 10 months ago

hello-xiaow commented 10 months ago

Thank you for providing such a wonderful job! I couldn't find the audiocaptioning pretraining model on Wavcaps. [CNN14-BART baseline,HTSAT-BART baseline],Can you provide it?

XinhaoMei commented 10 months ago

Hi, here is the link.

hello-xiaow commented 10 months ago

Hi, here is the link.

Thank you for your reply. Are the models you provided fine-tuned on the audiocap and clotho datasets? Do you have a pre trained model on Wavcaps?

XinhaoMei commented 10 months ago

Hi, here is the link.

Thank you for your reply. Are the models you provided fine-tuned on the audiocap and clotho datasets? Do you have a pre trained model on Wavcaps?


For audio captioning, these baselines reported in the paper were only trained on AudioCaps or Clotho.

At this time, we didn't provide the checkpoints pretrained on WavCaps. Very sorry about this.

hello-xiaow commented 10 months ago

Hi, here is the link.

Thank you for your reply. Are the models you provided fine-tuned on the audiocap and clotho datasets? Do you have a pre trained model on Wavcaps?


For audio captioning, these baselines reported in the paper were only trained on AudioCaps or Clotho.

At this time, we didn't provide the checkpoints pretrained on WavCaps. Very sorry about this.

Thank you for your reply