Open alexanderwerning opened 3 weeks ago
Hi, thank you for your interest. We will provide the pipeline code soon, and there are currently no plans to develop v2.
To avoid the impact on the model, we augmented the caption during training with a 25% random mask on the words. https://github.com/LoieSun/Auto-ACD/blob/08a2fd9dd3e4f2e81bc9cadff727b7aff1945d6e/laion_clap/training/data.py#L612
Hi, looking at the captions I noticed some things: Some captions contain raw probabilities, check for "0." or "(", numbers in brackets (together about 6%) or "probability" (about 2.5%) or "label" (4.6%); given proportions are relative to the full training data. As you probably still have the full data used to generate the captions, maybe you can regenerate these captions and release them as a v2 or something?
A lot of captions contain the word "creating" (33%), separating a literal from a more high level description, have you tested how this influences the model learning a higher level audio understanding?
I am still in the process of downloading the audio data for the dataset, so I could not test this by training a model yet myself.
What do you think? Thanks!