Errors at training start

GA3Dtech commented 1 week ago

Hi,

I wanted to test this interesting piper TTS model to clone my voice. I've first tried to make it locally, installing everything with .venv with python3.10 etc... and using my RTX Ada 2000. I've tried many ways and I always get errors with Pytorch or pytorch-lighting when starting the training. I thought it was a problem with this laptop GPU, so I decided to try your cool notebook running on Google Colab, and I get similar error ? do you have any idea where it could come from ? I've got the feeling there are a few updates in packages and ...

Are you still able to run it without any issues ?

DEBUG:piper_train:Namespace(dataset_dir='/content/drive/MyDrive/colab/piper/Test_voixdAlain', checkpoint_epochs=5, quality='medium', resume_from_single_speaker_checkpoint=None, logger=True, enable_checkpointing=True, default_root_dir=None, gradient_clip_val=None, gradient_clip_algorithm=None, num_nodes=1, num_processes=None, devices='1', gpus=None, auto_select_gpus=False, tpu_cores=None, ipus=None, enable_progress_bar=True, overfit_batches=0.0, track_grad_norm=-1, check_val_every_n_epoch=1, fast_dev_run=False, accumulate_grad_batches=None, max_epochs=10000, min_epochs=None, max_steps=-1, min_steps=None, max_time=None, limit_train_batches=None, limit_val_batches=None, limit_test_batches=None, limit_predict_batches=None, val_check_interval=None, log_every_n_steps=1000, accelerator='gpu', strategy=None, sync_batchnorm=False, precision=32, enable_model_summary=True, weights_save_path=None, num_sanity_val_steps=2, resume_from_checkpoint='/content/pretrained.ckpt', profiler=None, benchmark=None, deterministic=None, reload_dataloaders_every_n_epochs=0, auto_lr_find=False, replace_sampler_ddp=True, detect_anomaly=False, auto_scale_batch_size=False, plugins=None, amp_backend='native', amp_level=None, move_metrics_to_cpu=False, multiple_trainloader_mode='max_size_cycle', batch_size=12, validation_split=0.01, num_test_examples=1, max_phoneme_ids=None, hidden_channels=192, inter_channels=192, filter_channels=768, n_layers=6, n_heads=2, seed=1234, num_ckpt=0, save_last=True) /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py:52: LightningDeprecationWarning: SettingTrainer(resume_from_checkpoint=)is deprecated in v1.5 and will be removed in v1.7. Please passTrainer.fit(ckpt_path=)directly instead. rank_zero_deprecation( GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs DEBUG:piper_train:Checkpoints will be saved every 5 epoch(s) DEBUG:piper_train:0 Checkpoints will be saved DEBUG:vits.dataset:Loading dataset: /content/drive/MyDrive/colab/piper/Test_voixdAlain/dataset.jsonl /usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py:731: LightningDeprecationWarning:trainer.resume_from_checkpointis deprecated in v1.5 and will be removed in v2.0. Specify the fit checkpoint path withtrainer.fit(ckpt_path=)` instead. ckpt_path = ckpt_path or self.resume_from_checkpoint Missing logger folder: /content/drive/MyDrive/colab/piper/Test_voixdAlain/lightning_logs Restoring states from the checkpoint path at /content/pretrained.ckpt DEBUG:fsspec.local:open file: /content/pretrained.ckpt Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/content/piper/src/python/piper_train/main.py", line 173, in main() File "/content/piper/src/python/piper_train/main.py", line 150, in main trainer.fit(model) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 696, in fit self._call_and_handle_interrupt( File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 650, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 735, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1110, in _run self._restore_modules_and_callbacks(ckpt_path) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1065, in _restore_modules_and_callbacks self._checkpoint_connector.restore_model() File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 182, in restore_model self.trainer.strategy.load_model_state_dict(self._loaded_checkpoint) File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 343, in load_model_state_dict self.lightning_module.load_state_dict(checkpoint["state_dict"]) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1667, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for VitsModel: Unexpected key(s) in state_dict: "model_g.emb_g.weight", "model_g.dec.cond.weight", "model_g.dec.cond.bias", "model_g.enc_q.enc.cond_layer.bias", "model_g.enc_q.enc.cond_layer.weight_g", "model_g.enc_q.enc.cond_layer.weight_v", "model_g.flow.flows.0.enc.cond_layer.bias", "model_g.flow.flows.0.enc.cond_layer.weight_g", "model_g.flow.flows.0.enc.cond_layer.weight_v", "model_g.flow.flows.2.enc.cond_layer.bias", "model_g.flow.flows.2.enc.cond_layer.weight_g", "model_g.flow.flows.2.enc.cond_layer.weight_v", "model_g.flow.flows.4.enc.cond_layer.bias", "model_g.flow.flows.4.enc.cond_layer.weight_g", "model_g.flow.flows.4.enc.cond_layer.weight_v", "model_g.flow.flows.6.enc.cond_layer.bias", "model_g.flow.flows.6.enc.cond_layer.weight_g", "model_g.flow.flows.6.enc.cond_layer.weight_v", "model_g.dp.cond.weight", "model_g.dp.cond.bias".

`

rmcpantoja commented 1 week ago

Hi @GA3Dtech, If you are training a single speaker model (your voice only), you'll need to download a single speaker checkpoint. You can use one of the three qualities made for Lessak.

GA3Dtech commented 1 week ago

thk @rmcpantoja for your quick feedback,

I hadn't understood this point, it's because I did my dataset in french, and the french checkpoint available are apparently multispeaker based, I hadn't realized. Apparently you can also use the English checkpoints to train another language, so Lessak, the best standard, I'll see how it goes. Anyway, it's working now. Thanks a lot.

rmcpantoja commented 1 week ago

@GA3Dtech, You can use siwis to finetune French voices. In this case, Lessac works, but there are cases where it's convenient to finetune using a model of the same language, since the resulting voice will speak some phonemes and consonants of the learned model.

ZachB100 / Piper-Training-Guide-with-Screen-Reader

Errors at training start #4