facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
20.5k stars 2.06k forks source link

How can I assign wav condition cache with my train dataset, when training melody model #261

Closed sakemin closed 11 months ago

sakemin commented 12 months ago

Now I'm trying to fine-tune the melody model, aiming to use the same music file from dataset as WavCondition for the melody condition.

But I found the WavCondition having weird values and in audiocraft.solvers.musicgen.py Line 263, it says that I must use chroma cache.

So I dig into cacheing part, and I found CACHE GENERATION JOBS in audiocraft/grids/musicgen/musicgen_melody_32khz.py.

If I set 'dataset.segment_duration' to my audio file length, and set 'model/lm/model_scale' to 'medium' which is the size for melody model, will it work as I expected? Or am I just having a big misunderstanding with this CACHE GENERATION JOBS in grids file? The model size is set to 'xsmall' so I also think this might be an another model for music(cache) generating jobs...

I want to use the dataset audio file, which might be the target audio, as the input Wav Condition to make melody chroma out of it.

Is there any way that I can assign input wav files using built in cache system, to make them assigned as WavCondition?

Thanks.

rohandubey commented 11 months ago

Hi, Can you explain to me the process of fine-tuning the melody model? What is the format of data-source and how should we proceed with it? Thank you for your time and cosideration!

cvillela commented 11 months ago

Hey, I am also wondering the same thing. Is the dataset for training the melody model the same as the purely text-to-audio models?

sakemin commented 11 months ago

Hello @rohandubey and @cvillela ,

Yes the dataset format is the same with the pure text2audio models. (MusicDataset and metadata) Since the melody model is loading/using the audio file as WavCondition("self_wav" in the codes), the dataset structure is the same. However, the wav files are converted into one-hot melody chroma vectors, and then cached in the storage. Thus, you need to specify the storage path to save the caches. Refer here, as the default cache.path is None, when it's value is specified with your own path, then the code will start caching melody chroma conditions. Hope you succeed in training the melody model.

Sake