Open vican9000 opened 10 months ago
Related to this question, I'm curious what the requirements are for training a stereo MusicGen model. A while ago I tested with the 48khz stereo EnCodec but it seems it's not supported with MusicGen, due to the normalization values that accompany the latent codes. So can anyone give any advice/guidance on stereo MusicGen training? Is it actually possible, or do the stereo EnCodec models always have the normalization values?
First of all, great project!
One question though: in the original paper, you mentioned using a four quantizer Encodec for MusicGen training, with a pretty large stride (50 Hz). This will produce a pretty low quality output (and monophonic, and 32 kHz-only). Have you done any ablation studies with trying larger bandwidths? For instance, in the Encodec paper, you've trained a stereo 48kHz 24kbit/s model. What were the issues with using this in MusicGen?
@adefossez hopefully you can shed some light here. Thanks!