LAION-AI / audio-dataset

Audio Dataset for training CLAP and other models
632 stars 53 forks source link

Which is the most suitable Music Dataset for training MuLaN? #86

Open ukemamaster opened 1 year ago

ukemamaster commented 1 year ago

@marianna13 Hi Mariana, It seems like there are several options to have a Music Dataset. However, could you recommend me one (or many) for training the MuLaN model?

They used 44 million music recordings (almost 370K hours). The following table show some examples of their texts of 3 different types.

Screenshot from 2023-02-09 11-50-18

marianna13 commented 1 year ago

Hi! I would suggest you to use Juno, Genius (long captions). Free Music archive (short captions), Jamendo (mixed)

ukemamaster commented 1 year ago

@marianna13 Thanks Mariana