LAION-AI / CLAP

Contrastive Language-Audio Pretraining
https://arxiv.org/abs/2211.06687
Creative Commons Zero v1.0 Universal
1.43k stars 137 forks source link

Training dataset for new released CLAP checkpoint of music #97

Closed RookieJunChen closed 1 year ago

RookieJunChen commented 1 year ago

I’m very impressed by the new CLAP pretrained checkpoints pretrained on music that you just updated on your GitHub repository. I think this is a very interesting and meaningful update! By the way, I have a question about your new update. What music-related datasets 🎵 did you use to pretrain this checkpoint?

RetroCirce commented 1 year ago

You can refer to this repo: https://github.com/LAION-AI/audio-dataset/blob/main/data_collection/README.md and check the "Music Dataset". We chose some datasets inside this slot to for music checkpoint training (with Audioset).

However, we cannot fully confirm that whichever in "Music Dataset" allows the training (some datasets prohibit the utilization of machine learning training because of copyright issues and artist protections). Therefore, we cannot directly mention what exactly datasets we use, and thank you for your understanding. Once we figure out each license of the dataset, we will clarify it.