Open Apple-jun opened 3 months ago
@Apple-jun I'm currently working on a project where I need to use vocal audio as the input and generate instrumental music as the output using the musicgen-melody model. I understand that typically the target audio and condition audio are the same, but I need them to be different in my case. Any progress regarding this matter?
Now I know that when I fine-tune musicgen-melody, the form of the dataset is the same as the form of the other text2music models, that is, the target audio and the condition audio to be converted to the chromatogram are the same. So I would like to ask how to make them different? I gave it a try but found the code hard to modify. I would appreciate any help.