LAION-AI / audio-dataset

Audio Dataset for training CLAP and other models
632 stars 53 forks source link

Dataset Plan #5

Open lukewys opened 2 years ago

lukewys commented 2 years ago

@rvencu @rom1504 We need more data in the next step. The data we need in the ranking of priority is:

  1. Audio data with natural text description(s).
  2. Audio data with other labels, and "made up" a text description for the audio.

For audio data with natural text description, we further need:

For audio data with other labels, we need to collect new large datasets while converting our current dataset with tag labels.

The datasets in top priority are those with large size and easy to turn labels into a text description:

(The following datasets all are those with tag labels of the audio)

The datasets we currently have that need converting labels to text are:

We should come up with a unified way of converting tags to text. We could reference how CLIP did that (in converting classification to natural text).

wwfcnu commented 1 year ago

For example, wesoundeffect data sets, it seems a bit reluctant to use files as captions For example :Bowling_Re-Rack_Machinery_All-Lanes-In-A-Row.wav,