YuanGongND / ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
337 stars 27 forks source link

About the audio-text pair of AudioSet dataset. #39

Open blue-blue272 opened 2 weeks ago

blue-blue272 commented 2 weeks ago

AudioSet only contains audio and event labels. How do you obtain the caption description for audios in the audioset dataset?

YuanGongND commented 2 weeks ago

Please check this: https://github.com/XinhaoMei/WavCaps. It is in the paper, but probably not very obvious place.
