caption quality - Githubissues

Hi, looking at the captions I noticed some things: Some captions contain raw probabilities, check for "0." or "(", numbers in brackets (together about 6%) or "probability" (about 2.5%) or "label" (4.6%); given proportions are relative to the full training data. As you probably still have the full data used to generate the captions, maybe you can regenerate these captions and release them as a v2 or something?

A lot of captions contain the word "creating" (33%), separating a literal from a more high level description, have you tested how this influences the model learning a higher level audio understanding?

I am still in the process of downloading the audio data for the dataset, so I could not test this by training a model yet myself.

What do you think? Thanks!

LoieSun / Auto-ACD

caption quality #4