coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
34.99k stars 4.27k forks source link

[Bug] events.out.tfevents is not generated when using --restore_path flag in python TTS/TTS/bin/train_tacotron.py #563

Closed Sadam1195 closed 2 years ago

Sadam1195 commented 3 years ago

Describe the bug Each training run is supposed to generate events.out.tfevents for tensorboard but when I am using--restore_path it doesn't generateevents.out.tfeventsxxxx file

To Reproduce Steps to reproduce the behavior:

  1. Run the following command
    CUDA_VISIBLE_DEVICES="0" python TTS/TTS/bin/train_tacotron.py --restore_path ./Results/ljspeech-ddc-June-09-2021_08+36PM-0000000/best_model.pth.tar \
                                                         --config_path ./tacotron2-DDC.json \
                                                          --coqpit.output_path ./Results  \
                                                          --coqpit.datasets.0.path ./Italian_dataset/it_IT/by_book/male/riccardo_fasol/il_ritratto_del_diavolo   \
                                                          --coqpit.audio.stats_path ./scale_stats.npy \
  2. No Runtime or compile Error

Expected behavior It should generate events.out.tfeventsxxxx file in the training RUN directory by default.

Environment (please complete the following information):

erogol commented 3 years ago

when you call --restore_path, it starts a new training in a different folder with the restored model. Did you check this new folder?

Sadam1195 commented 3 years ago

when you call --restore_path, it starts a new training in a different folder with the restored model. Did you check this new folder?

Yes, I did. New folder had all other files/folders except for events.out.tfevents file.

Sadam1195 commented 3 years ago

when you call --restore_path, it starts a new training in a different folder with the restored model. Did you check this new folder?

Yes, I did. New folder had all other files/folders except for events.out.tfevents file.

Just to make sure I will run the command again to see if the problem is consistent. If the issue appears, I will reopen the issue. I am closing it for now.

Ca-ressemble-a-du-fake commented 2 years ago

Hi, It seems that the events.out.tfevents file is not updated when continuing training VITS model. And even while training from scratch the behaviour is odd because sometimes events.out file gets only updated once after 70k steps (and then you can eg listen to test sentences at many points in time).

I am using latest version (0.5) from early January 2022.

Ca-ressemble-a-du-fake commented 2 years ago

That's weird! Looking at the folder now just after writing the above comment a new events.out.tfevents file has just appeared (1 day after starting the continue training). File size is 569MB and creation date is from yesterday. Maybe events.out.tfevents file is only output when its size is big enough ? So adding many test sentences would make it grow faster and be updated oftener ?

Sadam1195 commented 2 years ago

Maybe events.out.tfevents file is only output when its size is big enough ?

When I last trained models on colab I noticed my io operations limit maxed out so events.out.tfevents file writing in real-time wasn't being placed on drive. So until training gets interrupted or stops only then updated events.out.tfevents was being written on drive. Maybe it's tf bug but AFAIK in my case it was due to Colab's i/o operation bound limits being maxed out.

@Ca-ressemble-a-du-fake

Ca-ressemble-a-du-fake commented 2 years ago

Ah ok, that explained what I saw then! Actually the events file was created when the training stopped while I was writing the comment! Thank you for your answer!

TejaswiniiB commented 2 years ago

Hi @Ca-ressemble-a-du-fake , For me , .pth checkpoints are not being generated when --restore-path arg is used. Do you know why? I'm also using google colab. Rest of the files are being updated though as shown in below pic.


image

TejaswiniiB commented 2 years ago

Hi @Ca-ressemble-a-du-fake ,

  1. For me , .pth checkpoints are not being generated when --restore-path arg is used. Do you know why? I'm also using google colab. Rest of the files are being updated though as shown in below pic.
  2. And how to make use of events.out.tfevents file?

image

checkpoint got written now at 280000 step. How can we make to write checkpoints more frequently?

Sadam1195 commented 2 years ago

Hi @Ca-ressemble-a-du-fake ,

  1. For me , .pth checkpoints are not being generated when --restore-path arg is used. Do you know why? I'm also using google colab. Rest of the files are being updated though as shown in below pic.
  2. And how to make use of events.out.tfevents file?

image

checkpoint got written now at 280000 step. How can we make to write checkpoints more frequently?

Edit save_step in config.json file to your desired step size for saving model. @TejaswiniiB