Closed sakemin closed 11 months ago
Hi, Can you explain to me the process of fine-tuning the melody model? What is the format of data-source and how should we proceed with it? Thank you for your time and cosideration!
Hey, I am also wondering the same thing. Is the dataset for training the melody model the same as the purely text-to-audio models?
Hello @rohandubey and @cvillela ,
Yes the dataset format is the same with the pure text2audio models. (MusicDataset and metadata) Since the melody model is loading/using the audio file as WavCondition("self_wav" in the codes), the dataset structure is the same. However, the wav files are converted into one-hot melody chroma vectors, and then cached in the storage. Thus, you need to specify the storage path to save the caches. Refer here, as the default cache.path is None, when it's value is specified with your own path, then the code will start caching melody chroma conditions. Hope you succeed in training the melody model.
Sake
Now I'm trying to fine-tune the melody model, aiming to use the same music file from dataset as WavCondition for the melody condition.
But I found the WavCondition having weird values and in
audiocraft.solvers.musicgen.py Line 263
, it says that I must use chroma cache.So I dig into cacheing part, and I found CACHE GENERATION JOBS in
audiocraft/grids/musicgen/musicgen_melody_32khz.py
.If I set
'dataset.segment_duration'
to my audio file length, and set'model/lm/model_scale'
to'medium'
which is the size for melody model, will it work as I expected? Or am I just having a big misunderstanding with this CACHE GENERATION JOBS in grids file? The model size is set to'xsmall'
so I also think this might be an another model for music(cache) generating jobs...I want to use the dataset audio file, which might be the target audio, as the input Wav Condition to make melody chroma out of it.
Is there any way that I can assign input wav files using built in cache system, to make them assigned as WavCondition?
Thanks.