IAHispano / Applio

VITS-based Voice Conversion focused on simplicity, quality and performance.
https://applio.org
MIT License
1.36k stars 230 forks source link

Overtrain fix #431

Closed rappc87 closed 2 months ago

rappc87 commented 2 months ago

now overtrain works as it should. after training for lowest_value+overtrain_threshold, if there is no decrease in lowest_value, it overtrains and the train stops.

rappc87 commented 2 months ago

added saving best_epoch every time lowest_value changes

aitronssesin commented 2 months ago

I think this doesn't work, I started training with 10 overtraining threshold and it started saving a lot of models randomly but then it removed all saved models and the training was so slow

C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=1 | step=40 | time=12:46:47 | training_speed=0:00:27 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_1e_40s_best_epoch.pth' (epoch 1 and step 40) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=2 | step=80 | time=12:47:14 | training_speed=0:00:22 | lowest_value=27.87265396118164 (epoch 2 and step 69) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_2e_80s_best_epoch.pth' (epoch 2 and step 80) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=3 | step=120 | time=12:47:41 | training_speed=0:00:21 | lowest_value=25.2319278717041 (epoch 3 and step 116) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_3e_120s_best_epoch.pth' (epoch 3 and step 120) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=4 | step=160 | time=12:48:08 | training_speed=0:00:21 | lowest_value=18.749269485473633 (epoch 4 and step 150) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_4e_160s_best_epoch.pth' (epoch 4 and step 160) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=5 | step=200 | time=12:48:33 | training_speed=0:00:21 | lowest_value=16.182971954345703 (epoch 5 and step 175) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_5e_200s_best_epoch.pth' (epoch 5 and step 200) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=6 | step=240 | time=12:48:57 | training_speed=0:00:20 | lowest_value=16.182971954345703 (epoch 5 and step 175) | Number of epochs remaining for overtraining: 9 C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=7 | step=280 | time=12:49:17 | training_speed=0:00:20 | lowest_value=13.52884292602539 (epoch 7 and step 268) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_7e_280s_best_epoch.pth' (epoch 7 and step 280) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=8 | step=320 | time=12:49:42 | training_speed=0:00:20 | lowest_value=13.52884292602539 (epoch 7 and step 268) | Number of epochs remaining for overtraining: 9 C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=9 | step=360 | time=12:50:03 | training_speed=0:00:21 | lowest_value=13.52884292602539 (epoch 7 and step 268) | Number of epochs remaining for overtraining: 8 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1\G_400.pth' (epoch 10) Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1\D_400.pth' (epoch 10) Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_10e_400s.pth' (epoch 10 and step 400) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=10 | step=400 | time=12:50:42 | training_speed=0:00:38 | lowest_value=12.121013641357422 (epoch 10 and step 377) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_10e_400s_best_epoch.pth' (epoch 10 and step 400) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=11 | step=440 | time=12:51:07 | training_speed=0:00:18 | lowest_value=11.236621856689453 (epoch 11 and step 400) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_11e_440s_best_epoch.pth' (epoch 11 and step 440) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=12 | step=480 | time=12:51:31 | training_speed=0:00:20 | lowest_value=10.71220874786377 (epoch 12 and step 454) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_12e_480s_best_epoch.pth' (epoch 12 and step 480) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=13 | step=520 | time=12:51:56 | training_speed=0:00:20 | lowest_value=10.71220874786377 (epoch 12 and step 454) | Number of epochs remaining for overtraining: 9 C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=14 | step=560 | time=12:52:16 | training_speed=0:00:20 | lowest_value=9.570732116699219 (epoch 14 and step 536) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_14e_560s_best_epoch.pth' (epoch 14 and step 560) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=15 | step=600 | time=12:52:42 | training_speed=0:00:21 | lowest_value=8.836076736450195 (epoch 15 and step 560) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_15e_600s_best_epoch.pth' (epoch 15 and step 600)

Saved files (the first epoch is the sync graph) image

rappc87 commented 2 months ago

I think this doesn't work, I started training with 10 overtraining threshold and it started saving a lot of models randomly but then it removed all saved models and the training was so slow

C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=1 | step=40 | time=12:46:47 | training_speed=0:00:27 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_1e_40s_best_epoch.pth' (epoch 1 and step 40) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=2 | step=80 | time=12:47:14 | training_speed=0:00:22 | lowest_value=27.87265396118164 (epoch 2 and step 69) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_2e_80s_best_epoch.pth' (epoch 2 and step 80) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=3 | step=120 | time=12:47:41 | training_speed=0:00:21 | lowest_value=25.2319278717041 (epoch 3 and step 116) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_3e_120s_best_epoch.pth' (epoch 3 and step 120) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=4 | step=160 | time=12:48:08 | training_speed=0:00:21 | lowest_value=18.749269485473633 (epoch 4 and step 150) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_4e_160s_best_epoch.pth' (epoch 4 and step 160) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=5 | step=200 | time=12:48:33 | training_speed=0:00:21 | lowest_value=16.182971954345703 (epoch 5 and step 175) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_5e_200s_best_epoch.pth' (epoch 5 and step 200) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=6 | step=240 | time=12:48:57 | training_speed=0:00:20 | lowest_value=16.182971954345703 (epoch 5 and step 175) | Number of epochs remaining for overtraining: 9 C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=7 | step=280 | time=12:49:17 | training_speed=0:00:20 | lowest_value=13.52884292602539 (epoch 7 and step 268) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_7e_280s_best_epoch.pth' (epoch 7 and step 280) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=8 | step=320 | time=12:49:42 | training_speed=0:00:20 | lowest_value=13.52884292602539 (epoch 7 and step 268) | Number of epochs remaining for overtraining: 9 C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=9 | step=360 | time=12:50:03 | training_speed=0:00:21 | lowest_value=13.52884292602539 (epoch 7 and step 268) | Number of epochs remaining for overtraining: 8 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1\G_400.pth' (epoch 10) Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1\D_400.pth' (epoch 10) Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_10e_400s.pth' (epoch 10 and step 400) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=10 | step=400 | time=12:50:42 | training_speed=0:00:38 | lowest_value=12.121013641357422 (epoch 10 and step 377) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_10e_400s_best_epoch.pth' (epoch 10 and step 400) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=11 | step=440 | time=12:51:07 | training_speed=0:00:18 | lowest_value=11.236621856689453 (epoch 11 and step 400) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_11e_440s_best_epoch.pth' (epoch 11 and step 440) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=12 | step=480 | time=12:51:31 | training_speed=0:00:20 | lowest_value=10.71220874786377 (epoch 12 and step 454) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_12e_480s_best_epoch.pth' (epoch 12 and step 480) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=13 | step=520 | time=12:51:56 | training_speed=0:00:20 | lowest_value=10.71220874786377 (epoch 12 and step 454) | Number of epochs remaining for overtraining: 9 C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=14 | step=560 | time=12:52:16 | training_speed=0:00:20 | lowest_value=9.570732116699219 (epoch 14 and step 536) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_14e_560s_best_epoch.pth' (epoch 14 and step 560) C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1 | epoch=15 | step=600 | time=12:52:42 | training_speed=0:00:21 | lowest_value=8.836076736450195 (epoch 15 and step 560) | Number of epochs remaining for overtraining: 10 Saved model 'C:\Users\Aitor\Downloads\IA_Applio\Applio\logs\Prueba1_15e_600s_best_epoch.pth' (epoch 15 and step 600)

Saved files (the first epoch is the sync graph) image

You said this fix doesn't work but I would like to summarize how it works.

From the screenshot I see that you have set it to save every 10 epochs.

If you set the overtraining threshold to 10 epochs, this means. Save another 10 epochs after the last recorded best_epoch.pth file and if there is no improvement, finish training because the model is overtraining.

and every time a new best_epoch is found it deletes the old best_epoch file because the previous best_epoch.pth file is no longer the best epoch.

So to summarize, if current_epoch > best_epoch+overtraining_threshold_value stop training bc of overtraining. and every time a new best epoch is found, save best_epoch.pth and delete the previous best epoch file.

aitronssesin commented 2 months ago

We're gonna merge this pull request and give it a spin. If the overtraining detector looks sharper, we'll roll with the changes.