Dataset Plan - Githubissues

@rvencu @rom1504 We need more data in the next step. The data we need in the ranking of priority is:

For audio data with natural text description, we further need:

MACS - Multi-Annotator Captioned Soundscapes: a dataset containing audio captions and corresponding audio tags for a number of 3930 audio files of the TAU Urban Acoustic Scenes 2019 development dataset (airport, public square, and park). The files were annotated using a web-based tool.
Free Sound: scrape audio and text description from Free Sound. It is ok that the texts are a bit noisy.
High-quality sound effect library with similar quality as BBC Sound Effect: such as https://www.sound-ideas.com/Default.aspx or https://www.boomlibrary.com/ who has high-quality text descriptions of the audio rather than tags and labels.
Music review websites: such as Pitch Fork

For audio data with other labels, we need to collect new large datasets while converting our current dataset with tag labels.

The datasets in top priority are those with large size and easy to turn labels into a text description:

(The following datasets all are those with tag labels of the audio)

The datasets we currently have that need converting labels to text are:

We should come up with a unified way of converting tags to text. We could reference how CLIP did that (in converting classification to natural text).

LAION-AI / audio-dataset