facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
MIT License
20.5k stars 2.06k forks source link

What is "reference" element when generate step on training? #272

Open sakemin opened 11 months ago

sakemin commented 11 months ago

Hello, I'm currently fine tuning musicgen with dora run command. I set generate.every = 1 so that there's a generation process every epoch.

The output files of generation process are composed with one JSON file and one WAV file.

In JSON file, there's "reference" element in it, like below.

"reference": {
    "id": "3ce288e9d7c8e658c9004067ac98f1de970d7dd9",
    "path": "/mnt/nvme/tmp/audiocraft_sake/xps/454f0a30/samples/reference/3ce288e9d7c8e658c9004067ac98f1de970d7dd9.wav",
    "duration": 30.0
  },

Seems like the output WAV file has similar beginning with the ref WAV file, but it seems like not giving this ref file as prompt (because I set generate.lm.prompted_samples = False, so in JSON file it is "prompt": null). What is this idea of 'reference' and what does it do?

Thank you

adefossez commented 11 months ago

i think it is the wav file in the dataset that matches the description used to generate the sample, even without prompting.

jbmaxwell commented 11 months ago

Ah, interesting. That makes sense; I suppose they have to get the generate descriptions from somewhere and this is the easiest way to do that.