chavinlo / musicgen_trainer

simple trainer for musicgen/audiocraft
GNU Affero General Public License v3.0
1 stars 0 forks source link

codes concatenation #6

Closed Beinabih closed 1 year ago

Beinabih commented 1 year ago

Hi,

Thank you for your code. It has been very helpful to me in writing my own trainer.

I think the line codes = torch.cat([audio, audio], dim=0). (here) should be codes = torch.cat([codes, codes], dim=0) otherwise you wont use your encoded codebooks, right? :)

kind regards, Jonas

chavinlo commented 1 year ago

um... no...

The codebook is obtained via the preprocess_audio that takes both the model and audio waveform as inputs. It then encodes them with encodec (here called compression_model) and then returns them.

Here is where we call that function with model and audio waveform (tensor) as arguments and we get the codebook in return https://github.com/chavinlo/musicgen_trainer/blob/5fd56f0a34f8b3ee7d5249e28a75b7e92349323f/train.py#L139

Note that the codebook here is just one batch, we then concate it to match the shape of the condition_tensors.

chavinlo commented 1 year ago

Feel free to correct me if I'm wrong, because the trainer does not work for anything outside overfitting at the moment.

Beinabih commented 1 year ago

Ah sry for the confusion, your totally right...

I moved the model.compression_model.encode(wav) out of the process_audio function since i am using the Audiocraft Dataloader. I think i hallucinated some of my code in your trainer.py