acids-ircam / RAVE

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder
Other
1.29k stars 175 forks source link

Fixed loss during prior training #106

Closed chebmarcel closed 1 year ago

chebmarcel commented 2 years ago

Hello,

It seems that i'm missing something in the training procedure. I trained a RAVE model for about 650K steps (after which it seemed to plateau).

Then exported it and started training the prior. Weirdly I am already at 500K steps and the loss is not decreasing, it seems to be stuck between 3,19 and 3,12.

When i try to get audio sample in Tensorboard, the prior model outputs noise whereas the rave one is doing ok.

I am not sure what i'm doing wrong, I read some people talking about phase 2 kicking in, but im also confused wether this is the second step (the GAN training) or something happening within the first step.

If someone could shed some light on these questions that would be super helpful. Thanks!

moiseshorta commented 2 years ago

It seems you need to finish training until the second phase of training, which by default kicks in after 1 Million steps, or you can set it custom with the --warmup flag.

chebmarcel commented 2 years ago

@moiseshorta Thanks for your reply! Indeed after phase 2 kicks in I get much better results I am still a bit confused regarding the prior training since phase 2 already seems to be using a GAN framework. What does the prior training exactly adds from the RAVE model and how long do you train your model on it ?

moiseshorta commented 2 years ago

@chebmarcel The prior is actually another neural network, a type of RNN, which basically tries to predict the most likely next latent variable of your pre-trained RAVE model...it is needed if you want to perform unconditional generation (not timbre transfer)

chebmarcel commented 2 years ago

@moiseshorta Ok great, its clear now! Last question, do you train the prior for as many steps as the RAVE model? Or what would be a good step ratio? Thanks!

moiseshorta commented 2 years ago

@chebmarcel depends on you, I usually train beyond 1M steps for the prior, but really depends on your dataset and how it converges

chebmarcel commented 2 years ago

Hi @moiseshorta sorry for reopening haha. I trained the Rave model over 2M steps and it gives me very good results. Nevertheless when i train the prior after the loss is still not going down. I dont really understand why, does this mean that the prior training is not really effective in the case of my dataset? Thannks

lang216 commented 2 years ago

Hi, do you mind showing the distance graph of your training? I don't know why after 1M steps my distance graph starts increasing.

chebmarcel commented 2 years ago

No worries this is normal, till 1M steps it is the warmup phase, the distance logs after 1M dont really matter as it is a different phase.

Capture d’écran 2022-09-13 à 00 03 02
lang216 commented 2 years ago

OMG it makes so much sense now lol! I just checked the train,py and saw 1M is hard coded inside the system. Thank you! So do I just need to check the validation graph now?

Also, do you know how the steps (the 3353 below) are calculated? My batch size is 8 (the default). And sample size is 14376. Epoch 308: 35% 1160/3353 [09:48<-1:50:12, -3.72it/s, v_num=24]

chebmarcel commented 2 years ago

The second part is using a GAN framework so you want to check the loss_dis and loss_gen curves now. An epoch usually means one iteration over all of the training data. For instance if you have 20,000 images and a batch size of 100 then the epoch should contain 20,000 / 100 = 200 steps