Dataset used for checkpoint in the huggingface

LAION-AI / CLAP

Contrastive Language-Audio Pretraining

https://arxiv.org/abs/2211.06687

Creative Commons Zero v1.0 Universal

1.36k stars 133 forks source link

Dataset used for checkpoint in the huggingface #123

Open jaeyeonkim99 opened 1 year ago

jaeyeonkim99 commented 1 year ago

Hello! I am now using the CLAP model for my research, and the checkpoints from the huggingface transformers ('laion/clap-htsat-unfused") works best for me. However, I cannot find exactly which datasets are used for the checkpoint unlike the models linked in this repository.

Can I get exact information about the datasets used for the huggingface pretrained models?

RetroCirce commented 1 year ago

Hi,

I think the unfused and fused models of huggingface transformers come from these two checkpoints of our own:

For general audio less than 10-sec: 630k-audioset-best.pt
For general audio with variable-length: 630k-audioset-fusion-best.pt

They are models we presented in the paper, while other models (such as music, music+speech+...) come from our continuing training after the paper publication.

Best, Ke