EveryVoiceTTS / EveryVoice

The EveryVoice TTS Toolkit - Text To Speech for your language
https://docs.everyvoice.ca
Other
20 stars 2 forks source link

Turning `model.learn_alignment=false` breaks `everyvoice train text-to-spec` #452

Closed SamuelLarkin closed 2 months ago

SamuelLarkin commented 3 months ago

relates to: https://github.com/roedoejet/EveryVoice/issues/451

How to reproduce

preprocessed/

Note that there is no duration/ in preprocessed/

lsd preprocessed/
drwxrwx--- sam037 nrc_ict 128 KB Wed Jun  5 10:05:09 2024  attn
drwxrwx--- sam037 nrc_ict  64 KB Wed Jun  5 10:05:03 2024  audio
drwxrwx--- sam037 nrc_ict  64 KB Wed Jun  5 10:05:11 2024  energy
.rw-rw---- sam037 nrc_ict 305 KB Wed Jun  5 10:05:03 2024  filelist.psv
drwxrwx--- sam037 nrc_ict  64 KB Wed Jun  5 10:05:13 2024  pitch
drwxrwx--- sam037 nrc_ict  64 KB Wed Jun  5 10:05:06 2024  spec
.rw-rw---- sam037 nrc_ict 403 B  Wed Jun  5 10:05:15 2024  stats.json
.rw-rw---- sam037 nrc_ict 386 B  Wed Jun  5 10:05:03 2024  summary.txt
.rw-rw---- sam037 nrc_ict 275 KB Wed Jun  5 10:05:13 2024  training_filelist.psv
.rw-rw---- sam037 nrc_ict  30 KB Wed Jun  5 10:05:13 2024  validation_filelist.psv

Log

2024-06-05 10:09:07.352 | INFO     | everyvoice.utils:update_config_from_cli_args:170 - Updating config 'model.learn_alignment' to value 'false'
2024-06-05 10:09:07.353 | INFO     | everyvoice.config.utils:load_partials:47 - You have both the key preprocessing and path_to_preprocessing_config_file defined in your configuration. We will override values from path_to_preprocessing_config_file with values from preprocessing
2024-06-05 10:09:07.356 | INFO     | everyvoice.config.utils:load_partials:47 - You have both the key text and path_to_text_config_file defined in your configuration. We will override values from path_to_text_config_file with values from text
2024-06-05 10:09:07.396 | INFO     | everyvoice.utils:update_config_from_cli_args:170 - Updating config 'model.learn_alignment' to value 'false'
2024-06-05 10:09:07.396 | INFO     | everyvoice.config.utils:load_partials:47 - You have both the key preprocessing and path_to_preprocessing_config_file defined in your configuration. We will override values from path_to_preprocessing_config_file with values from preprocessing
2024-06-05 10:09:07.400 | INFO     | everyvoice.config.utils:load_partials:47 - You have both the key text and path_to_text_config_file defined in your configuration. We will override values from path_to_text_config_file with values from text
2024-06-05 10:09:07.416 | INFO     | everyvoice.base_cli.helpers:save_configuration_to_log_dir:139 - Configuration
...
2024-06-05 10:09:07.433 | INFO     | everyvoice.base_cli.helpers:train_base_command:176 - Loading modules for training...
...
2024-06-05 10:09:08.250 | INFO     | everyvoice.base_cli.helpers:train_base_command:386 - Model's architecture
...
----------------------------------------------------------------------------------------------------
distributed_backend=gloo
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

  | Name               | Type                | Params
-----------------------------------------------------------
0 | loss               | FastSpeech2Loss     | 0
1 | text_input_layer   | Embedding           | 20.5 K
2 | position_embedding | PositionalEmbedding | 0
3 | encoder            | Conformer           | 6.1 M
4 | variance_adaptor   | VarianceAdaptor     | 1.1 M
5 | decoder            | Conformer           | 6.1 M
6 | mel_linear         | Linear              | 20.6 K
-----------------------------------------------------------
13.3 M    Trainable params
510       Non-trainable params
13.3 M    Total params
53.287    Total estimated model params size (MB)
/home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=127` in the `DataLoader` to improve performance.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /fs/hestia_Hnrc/ict/sam037/git/EveryVoice/everyvoice/model/feature_prediction/FastSpeech2_lightn │
│ ing/fs2/cli/train.py:28 in train                                                                 │
│                                                                                                  │
│   25 │                                                                                           │
│   26 │   model_kwargs = {"lang2id": lang2id, "speaker2id": speaker2id, "stats": stats}           │
│   27 │                                                                                           │
│ ❱ 28 │   train_base_command(                                                                     │
│   29 │   │   model_config=FastSpeech2Config,                                                     │
│   30 │   │   model=FastSpeech2,                                                                  │
│   31 │   │   data_module=FastSpeech2DataModule,                                                  │
│                                                                                                  │
│ /fs/hestia_Hnrc/ict/sam037/git/EveryVoice/everyvoice/base_cli/helpers.py:390 in                  │
│ train_base_command                                                                               │
│                                                                                                  │
│   387 │   │   tensorboard_logger.log_hyperparams(config.model_dump())                            │
│   388 │   │   trainer.validate_loop.verbose = True                                               │
│   389 │   │   trainer.fit_loop.epoch_loop.val_loop.verbose = True                                │
│ ❱ 390 │   │   trainer.fit(model_obj, data)                                                       │
│   391 │   │   # print("Post fitting")                                                            │
│   392 │   │   # trainer.validate(model=model_obj, datamodule=data, verbose=True)                 │
│   393 │   │   # trainer.validate(model=model_obj, verbose=True)                                  │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/pytorch_lightning/trainer/tr │
│ ainer.py:543 in fit                                                                              │
│                                                                                                  │
│    540 │   │   self.state.fn = TrainerFn.FITTING                                                 │
│    541 │   │   self.state.status = TrainerStatus.RUNNING                                         │
│    542 │   │   self.training = True                                                              │
│ ❱  543 │   │   call._call_and_handle_interrupt(                                                  │
│    544 │   │   │   self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule,  │
│    545 │   │   )                                                                                 │
│    546                                                                                           │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/pytorch_lightning/trainer/ca │
│ ll.py:43 in _call_and_handle_interrupt                                                           │
│                                                                                                  │
│    40 │   """                                                                                    │
│    41 │   try:                                                                                   │
│    42 │   │   if trainer.strategy.launcher is not None:                                          │
│ ❱  43 │   │   │   return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer,    │
│    44 │   │   return trainer_fn(*args, **kwargs)                                                 │
│    45 │                                                                                          │
│    46 │   except _TunerExitException:                                                            │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/pytorch_lightning/strategies │
│ /launchers/subprocess_script.py:105 in launch                                                    │
│                                                                                                  │
│   102 │   │   │   _launch_process_observer(self.procs)                                           │
│   103 │   │                                                                                      │
│   104 │   │   _set_num_threads_if_needed(num_processes=self.num_processes)                       │
│ ❱ 105 │   │   return function(*args, **kwargs)                                                   │
│   106 │                                                                                          │
│   107 │   @override                                                                              │
│   108 │   def kill(self, signum: _SIGNUM) -> None:                                               │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/pytorch_lightning/trainer/tr │
│ ainer.py:579 in _fit_impl                                                                        │
│                                                                                                  │
│    576 │   │   │   model_provided=True,                                                          │
│    577 │   │   │   model_connected=self.lightning_module is not None,                            │
│    578 │   │   )                                                                                 │
│ ❱  579 │   │   self._run(model, ckpt_path=ckpt_path)                                             │
│    580 │   │                                                                                     │
│    581 │   │   assert self.state.stopped                                                         │
│    582 │   │   self.training = False                                                             │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/pytorch_lightning/trainer/tr │
│ ainer.py:986 in _run                                                                             │
│                                                                                                  │
│    983 │   │   # ----------------------------                                                    │
│    984 │   │   # RUN THE TRAINER                                                                 │
│    985 │   │   # ----------------------------                                                    │
│ ❱  986 │   │   results = self._run_stage()                                                       │
│    987 │   │                                                                                     │
│    988 │   │   # ----------------------------                                                    │
│    989 │   │   # POST-Training CLEAN UP                                                          │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/pytorch_lightning/trainer/tr │
│ ainer.py:1030 in _run_stage                                                                      │
│                                                                                                  │
│   1027 │   │   │   return self.predict_loop.run()                                                │
│   1028 │   │   if self.training:                                                                 │
│   1029 │   │   │   with isolate_rng():                                                           │
│ ❱ 1030 │   │   │   │   self._run_sanity_check()                                                  │
│   1031 │   │   │   with torch.autograd.set_detect_anomaly(self._detect_anomaly):                 │
│   1032 │   │   │   │   self.fit_loop.run()                                                       │
│   1033 │   │   │   return None                                                                   │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/pytorch_lightning/trainer/tr │
│ ainer.py:1059 in _run_sanity_check                                                               │
│                                                                                                  │
│   1056 │   │   │   call._call_callback_hooks(self, "on_sanity_check_start")                      │
│   1057 │   │   │                                                                                 │
│   1058 │   │   │   # run eval step                                                               │
│ ❱ 1059 │   │   │   val_loop.run()                                                                │
│   1060 │   │   │                                                                                 │
│   1061 │   │   │   call._call_callback_hooks(self, "on_sanity_check_end")                        │
│   1062                                                                                           │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/pytorch_lightning/loops/util │
│ ities.py:182 in _decorator                                                                       │
│                                                                                                  │
│   179 │   │   else:                                                                              │
│   180 │   │   │   context_manager = torch.no_grad                                                │
│   181 │   │   with context_manager():                                                            │
│ ❱ 182 │   │   │   return loop_run(self, *args, **kwargs)                                         │
│   183 │                                                                                          │
│   184 │   return _decorator                                                                      │
│   185                                                                                            │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/pytorch_lightning/loops/eval │
│ uation_loop.py:128 in run                                                                        │
│                                                                                                  │
│   125 │   │   │   │   │   dataloader_idx = data_fetcher._dataloader_idx                          │
│   126 │   │   │   │   else:                                                                      │
│   127 │   │   │   │   │   dataloader_iter = None                                                 │
│ ❱ 128 │   │   │   │   │   batch, batch_idx, dataloader_idx = next(data_fetcher)                  │
│   129 │   │   │   │   if previous_dataloader_idx != dataloader_idx:                              │
│   130 │   │   │   │   │   # the dataloader has changed, notify the logger connector              │
│   131 │   │   │   │   │   self._store_dataloader_outputs()                                       │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/pytorch_lightning/loops/fetc │
│ hers.py:133 in __next__                                                                          │
│                                                                                                  │
│   130 │   │   │   │   self.done = not self.batches                                               │
│   131 │   │   elif not self.done:                                                                │
│   132 │   │   │   # this will run only when no pre-fetching was done.                            │
│ ❱ 133 │   │   │   batch = super().__next__()                                                     │
│   134 │   │   else:                                                                              │
│   135 │   │   │   # the iterator is empty                                                        │
│   136 │   │   │   raise StopIteration                                                            │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/pytorch_lightning/loops/fetc │
│ hers.py:60 in __next__                                                                           │
│                                                                                                  │
│    57 │   │   assert self.iterator is not None                                                   │
│    58 │   │   self._start_profiler()                                                             │
│    59 │   │   try:                                                                               │
│ ❱  60 │   │   │   batch = next(self.iterator)                                                    │
│    61 │   │   except StopIteration:                                                              │
│    62 │   │   │   self.done = True                                                               │
│    63 │   │   │   raise                                                                          │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/pytorch_lightning/utilities/ │
│ combined_loader.py:341 in __next__                                                               │
│                                                                                                  │
│   338 │                                                                                          │
│   339 │   def __next__(self) -> _ITERATOR_RETURN:                                                │
│   340 │   │   assert self._iterator is not None                                                  │
│ ❱ 341 │   │   out = next(self._iterator)                                                         │
│   342 │   │   if isinstance(self._iterator, _Sequential):                                        │
│   343 │   │   │   return out                                                                     │
│   344 │   │   out, batch_idx, dataloader_idx = out                                               │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/pytorch_lightning/utilities/ │
│ combined_loader.py:142 in __next__                                                               │
│                                                                                                  │
│   139 │   │   │   │   │   raise StopIteration                                                    │
│   140 │   │                                                                                      │
│   141 │   │   try:                                                                               │
│ ❱ 142 │   │   │   out = next(self.iterators[0])                                                  │
│   143 │   │   except StopIteration:                                                              │
│   144 │   │   │   # try the next iterator                                                        │
│   145 │   │   │   self._use_next_iterator()                                                      │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/torch/utils/data/dataloader. │
│ py:630 in __next__                                                                               │
│                                                                                                  │
│    627 │   │   │   if self._sampler_iter is None:                                                │
│    628 │   │   │   │   # TODO(https://github.com/pytorch/pytorch/issues/76750)                   │
│    629 │   │   │   │   self._reset()  # type: ignore[call-arg]                                   │
│ ❱  630 │   │   │   data = self._next_data()                                                      │
│    631 │   │   │   self._num_yielded += 1                                                        │
│    632 │   │   │   if self._dataset_kind == _DatasetKind.Iterable and \                          │
│    633 │   │   │   │   │   self._IterableDataset_len_called is not None and \                    │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/torch/utils/data/dataloader. │
│ py:674 in _next_data                                                                             │
│                                                                                                  │
│    671 │                                                                                         │
│    672 │   def _next_data(self):                                                                 │
│    673 │   │   index = self._next_index()  # may raise StopIteration                             │
│ ❱  674 │   │   data = self._dataset_fetcher.fetch(index)  # may raise StopIteration              │
│    675 │   │   if self._pin_memory:                                                              │
│    676 │   │   │   data = _utils.pin_memory.pin_memory(data, self._pin_memory_device)            │
│    677 │   │   return data                                                                       │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/torch/utils/data/_utils/fetc │
│ h.py:51 in fetch                                                                                 │
│                                                                                                  │
│   48 │   │   │   if hasattr(self.dataset, "__getitems__") and self.dataset.__getitems__:         │
│   49 │   │   │   │   data = self.dataset.__getitems__(possibly_batched_index)                    │
│   50 │   │   │   else:                                                                           │
│ ❱ 51 │   │   │   │   data = [self.dataset[idx] for idx in possibly_batched_index]                │
│   52 │   │   else:                                                                               │
│   53 │   │   │   data = self.dataset[possibly_batched_index]                                     │
│   54 │   │   return self.collate_fn(data)                                                        │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/torch/utils/data/_utils/fetc │
│ h.py:51 in <listcomp>                                                                            │
│                                                                                                  │
│   48 │   │   │   if hasattr(self.dataset, "__getitems__") and self.dataset.__getitems__:         │
│   49 │   │   │   │   data = self.dataset.__getitems__(possibly_batched_index)                    │
│   50 │   │   │   else:                                                                           │
│ ❱ 51 │   │   │   │   data = [self.dataset[idx] for idx in possibly_batched_index]                │
│   52 │   │   else:                                                                               │
│   53 │   │   │   data = self.dataset[possibly_batched_index]                                     │
│   54 │   │   return self.collate_fn(data)                                                        │
│                                                                                                  │
│ /fs/hestia_Hnrc/ict/sam037/git/EveryVoice/everyvoice/model/feature_prediction/FastSpeech2_lightn │
│ ing/fs2/dataset.py:134 in __getitem__                                                            │
│                                                                                                  │
│   131 │   │   │   │   │   │   f"{self.config.model.target_text_representation_level} have not    │
│   132 │   │   │   │   │   )                                                                      │
│   133 │   │   elif self.teacher_forcing or not self.inference:                                   │
│ ❱ 134 │   │   │   duration = self._load_file(                                                    │
│   135 │   │   │   │   basename, speaker, language, "duration", "duration.pt"                     │
│   136 │   │   │   )                                                                              │
│   137 │   │   else:                                                                              │
│                                                                                                  │
│ /fs/hestia_Hnrc/ict/sam037/git/EveryVoice/everyvoice/model/feature_prediction/FastSpeech2_lightn │
│ ing/fs2/dataset.py:51 in _load_file                                                              │
│                                                                                                  │
│    48 │   │   self.speaker2id = speaker2id                                                       │
│    49 │                                                                                          │
│    50 │   def _load_file(self, bn, spk, lang, dir, fn):                                          │
│ ❱  51 │   │   return torch.load(                                                                 │
│    52 │   │   │   self.preprocessed_dir / dir / self.sep.join([bn, spk, lang, fn])               │
│    53 │   │   )                                                                                  │
│    54                                                                                            │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/torch/serialization.py:986   │
│ in load                                                                                          │
│                                                                                                  │
│    983 │   if 'encoding' not in pickle_load_args.keys():                                         │
│    984 │   │   pickle_load_args['encoding'] = 'utf-8'                                            │
│    985 │                                                                                         │
│ ❱  986 │   with _open_file_like(f, 'rb') as opened_file:                                         │
│    987 │   │   if _is_zipfile(opened_file):                                                      │
│    988 │   │   │   # The zipfile reader is going to advance the current file position.           │
│    989 │   │   │   # If we want to actually tail call to torch.jit.load, we need to              │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/torch/serialization.py:435   │
│                                                                                                  │
│    432                                                                                           │
│    433 def _open_file_like(name_or_buffer, mode):                                                │
│    434 │   if _is_path(name_or_buffer):                                                          │
│ ❱  435 │   │   return _open_file(name_or_buffer, mode)                                           │
│    436 │   else:                                                                                 │
│    437 │   │   if 'w' in mode:                                                                   │
│    438 │   │   │   return _open_buffer_writer(name_or_buffer)                                    │
│                                                                                                  │
│ /home/sam037/.conda/envs/EveryVoice.sl/lib/python3.11/site-packages/torch/serialization.py:416   │
│ in __init__                                                                                      │
│                                                                                                  │
│    413                                                                                           │
│    414 class _open_file(_opener):                                                                │
│    415 │   def __init__(self, name, mode):                                                       │
│ ❱  416 │   │   super().__init__(open(name, mode))                                                │
│    417 │                                                                                         │
│    418 │   def __exit__(self, *args):                                                            │
│    419 │   │   self.file_like.close()                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
FileNotFoundError: [Errno 2] No such file or directory: '/gpfs/fs3c/nrc/dt/sam037/exp/EveryVoice/tiny.lj/resume/preprocessed/duration/LJ002-0111--default--eng--duration.pt'
Loading EveryVoice modules: 100%|██████████| 4/4 [00:33<00:00,  8.26s/it]
roedoejet commented 3 months ago

Yes - that's because if learn_alignment is set to False then you need to provide explicit durations from forced alignment, and the error here shows that they're not found FileNotFoundError: [Errno 2] No such file or directory: '/gpfs/fs3c/nrc/dt/sam037/exp/EveryVoice/tiny.lj/resume/preprocessed/duration/LJ002-0111--default--eng--duration.pt'. You can get them by either aligning your data using the DeepForcedAligner or another aligner like MFA, but given how good the jointly-learned alignment is, I don't think it's worth it. So, we could add a slightly more informative error message, but this is working as expected.