YuanGongND / ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
389 stars 36 forks source link

About the audio-text pair of AudioSet dataset. #39

Open blue-blue272 opened 4 months ago

blue-blue272 commented 4 months ago

AudioSet only contains audio and event labels. How do you obtain the caption description for audios in the audioset dataset?

YuanGongND commented 4 months ago

Please check this: https://github.com/XinhaoMei/WavCaps. It is in the paper, but probably not very obvious place.

-Yuan