Closed Sadam1195 closed 2 years ago
when you call --restore_path
, it starts a new training in a different folder with the restored model. Did you check this new folder?
when you call
--restore_path
, it starts a new training in a different folder with the restored model. Did you check this new folder?
Yes, I did. New folder had all other files/folders except for events.out.tfevents file.
when you call
--restore_path
, it starts a new training in a different folder with the restored model. Did you check this new folder?Yes, I did. New folder had all other files/folders except for events.out.tfevents file.
Just to make sure I will run the command again to see if the problem is consistent. If the issue appears, I will reopen the issue. I am closing it for now.
Hi,
It seems that the events.out.tfevents
file is not updated when continuing training VITS model. And even while training from scratch the behaviour is odd because sometimes events.out file gets only updated once after 70k steps (and then you can eg listen to test sentences at many points in time).
I am using latest version (0.5) from early January 2022.
That's weird! Looking at the folder now just after writing the above comment a new events.out.tfevents
file has just appeared (1 day after starting the continue training). File size is 569MB and creation date is from yesterday. Maybe events.out.tfevents
file is only output when its size is big enough ? So adding many test sentences would make it grow faster and be updated oftener ?
Maybe events.out.tfevents file is only output when its size is big enough ?
When I last trained models on colab I noticed my io operations limit maxed out so events.out.tfevents
file writing in real-time wasn't being placed on drive. So until training gets interrupted or stops only then updated events.out.tfevents
was being written on drive. Maybe it's tf
bug but AFAIK in my case it was due to Colab's i/o operation bound limits being maxed out.
@Ca-ressemble-a-du-fake
Ah ok, that explained what I saw then! Actually the events file was created when the training stopped while I was writing the comment! Thank you for your answer!
Hi @Ca-ressemble-a-du-fake , For me , .pth checkpoints are not being generated when --restore-path arg is used. Do you know why? I'm also using google colab. Rest of the files are being updated though as shown in below pic.
Hi @Ca-ressemble-a-du-fake ,
- For me , .pth checkpoints are not being generated when --restore-path arg is used. Do you know why? I'm also using google colab. Rest of the files are being updated though as shown in below pic.
- And how to make use of events.out.tfevents file?
checkpoint got written now at 280000 step. How can we make to write checkpoints more frequently?
Hi @Ca-ressemble-a-du-fake ,
- For me , .pth checkpoints are not being generated when --restore-path arg is used. Do you know why? I'm also using google colab. Rest of the files are being updated though as shown in below pic.
- And how to make use of events.out.tfevents file?
checkpoint got written now at 280000 step. How can we make to write checkpoints more frequently?
Edit save_step
in config.json
file to your desired step size for saving model.
@TejaswiniiB
Describe the bug Each training run is supposed to generate
events.out.tfevents
for tensorboard but when I am using--restore_path
it doesn't generateevents.out.tfeventsxxxx
fileTo Reproduce Steps to reproduce the behavior:
Expected behavior It should generate
events.out.tfeventsxxxx
file in the training RUN directory by default.Environment (please complete the following information):
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Ubuntu 18.04.5 LTS
PyTorch or TensorFlow version (use command below):
PyTorch 1.4.0, TensorFlow 2.1.0
Exact command to reproduce: