TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.81k stars 810 forks source link

tensorflow-tts-preprocess: assert len(mel) == len(f0) == len(energy) AssertionError #49

Closed ZDisket closed 4 years ago

ZDisket commented 4 years ago

My dataset which always worked properly gave me this error when running the preprocessing step.

2020-06-19 20:34:52.303453: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
[Preprocessing]:   0% 6/3431 [00:02<24:16,  2.35it/s]
[Preprocessing]:   0% 13/3431 [00:07<32:39,  1.74it/s]
[Preprocessing]:   0% 6/3431 [00:10<1:35:38,  1.68s/it]
[Preprocessing]:   2% 64/3431 [00:20<18:08,  3.09it/s]
[Preprocessing]:   1% 49/3431 [00:25<29:30,  1.91it/s]
[Preprocessing]:   2% 73/3431 [00:53<40:54,  1.37it/s]
[Preprocessing]:   3% 106/3431 [00:59<31:02,  1.78it/s]
[Preprocessing]:   0% 13/3431 [01:03<4:39:15,  4.90s/it]
[Preprocessing]:   2% 59/3431 [01:12<1:09:08,  1.23s/it]
[Preprocessing]:   6% 215/3431 [01:12<18:09,  2.95it/s]
[Preprocessing]:   6% 215/3431 [01:13<18:12,  2.94it/s]
[Preprocessing]:   0% 12/3431 [01:16<6:05:32,  6.41s/it]
[Preprocessing]:   1% 20/3431 [01:25<4:03:18,  4.28s/it]
[Preprocessing]:   2% 76/3431 [01:35<1:10:16,  1.26s/it]
[Preprocessing]:   5% 178/3431 [01:49<33:13,  1.63it/s]
[Preprocessing]:   6% 209/3431 [01:56<06:16,  8.55it/s]multiprocess.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/usr/local/lib/python3.6/dist-packages/pathos/helpers/mp_helper.py", line 15, in <lambda>
    func = lambda args: f(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/bin/preprocess.py", line 217, in save_to_file
    assert len(mel) == len(f0) == len(energy)
AssertionError
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/tensorflow-tts-preprocess", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/bin/preprocess.py", line 250, in main
[Preprocessing]:   6% 209/3431 [01:56<29:58,  1.79it/s]
    p.map(save_to_file, range(len(processor.items)))
  File "/usr/local/lib/python3.6/dist-packages/pathos/multiprocessing.py", line 137, in map
    return _pool.map(star(f), zip(*args)) # chunksize
  File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 644, in get
    raise self._value
AssertionError
[Preprocessing]:   0% 0/3431 [01:56<?, ?it/s]

When I run the normalization step it only does 1314 iterations instead of 3431 as it should. In addition, when trying to train I get this.

<ipython-input-12-8616bad8c9dc> in dotrain(inargs, ptpath, maxsteps)
    371         energy_stat=args.energy_stat,
    372         mel_length_threshold=mel_length_threshold,
--> 373         return_utt_id=False
    374     ).create(
    375         is_shuffle=config["is_shuffle"],

/content/TensorflowTTS/ttsexamples/fastspeech2/fastspeech2_dataset.py in __init__(self, root_dir, charactor_query, mel_query, duration_query, f0_query, energy_query, f0_stat, energy_stat, max_f0_embeddings, max_energy_embeddings, charactor_load_fn, mel_load_fn, duration_load_fn, f0_load_fn, energy_load_fn, mel_length_threshold, return_utt_id)
    101             mel_files = [mel_files[idx] for idx in idxs]
    102             charactor_files = [charactor_files[idx] for idx in idxs]
--> 103             duration_files = [duration_files[idx] for idx in idxs]
    104             mel_lengths = [mel_lengths[idx] for idx in idxs]
    105             f0_files = [f0_files[idx] for idx in idxs]

/content/TensorflowTTS/ttsexamples/fastspeech2/fastspeech2_dataset.py in <listcomp>(.0)
    101             mel_files = [mel_files[idx] for idx in idxs]
    102             charactor_files = [charactor_files[idx] for idx in idxs]
--> 103             duration_files = [duration_files[idx] for idx in idxs]
    104             mel_lengths = [mel_lengths[idx] for idx in idxs]
    105             f0_files = [f0_files[idx] for idx in idxs]

IndexError: list index out of range

I have no idea why this happens. My dataset is formatted exactly the same (like LJSpeech) as it was back when it was working.

dathudeptrai commented 4 years ago

@ZDisket I think the problem is about pw.dio :)) i think somehow its length is smaller than melspectrogram :)). In the code i use f0 = f0[:len(mel)]. But if len(f0) = len(mel) - 1 then assert len(mel) == len(f0) == len(energy) is false ^^. Pls help me check by print the len of f0/mel/energy after the assert :v.

ZDisket commented 4 years ago

@dathudeptrai I'll do that.

ZDisket commented 4 years ago

@dathudeptrai I have gone over to the assertion part and replaced it with this:

        if not len(mel) == len(f0) == len(energy):
          print("LEN NO MATCH: " + str(len(mel)) + " ; " + str(len(f0)) + " ; " + str(len(energy)))
        else:
          print("len " + str(len(mel)) + " ; " + str(len(f0)) + " ; " + str(len(energy)))

This is the dumped output from a partial run (not all my files), but it should give you enough information to work with. lenlog.txt

dathudeptrai commented 4 years ago

@ZDisket all failed cases is because len(f0) = len(mel) - 1 :)). i will fix

dathudeptrai commented 4 years ago

if len(f0) >= len(mel): f0 = f0[:len(mel)] else: f0 = np.pad(f0, ((0, 0), (0, len(mel) - len(f0)))) @ZDisket can you try ?

ZDisket commented 4 years ago

@dathudeptrai I replaced the f0 = f0[:len(mel)] after pw.dio with this:

        if len(f0) >= len(mel):
          f0 = f0[:len(mel)]
        else:
          f0 = np.pad(f0, ((0, 0), (0, len(mel) - len(f0))))

because your one-liner was giving syntax errors. I got another error.

multiprocess.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/usr/local/lib/python3.6/dist-packages/pathos/helpers/mp_helper.py", line 15, in <lambda>
    func = lambda args: f(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/bin/preprocess.py", line 213, in save_to_file
    f0 = np.pad(f0, ((0, 0), (0, len(mel) - len(f0))))
  File "<__array_function__ internals>", line 6, in pad
  File "/usr/local/lib/python3.6/dist-packages/numpy/lib/arraypad.py", line 748, in pad
    pad_width = _as_pairs(pad_width, array.ndim, as_index=True)
  File "/usr/local/lib/python3.6/dist-packages/numpy/lib/arraypad.py", line 523, in _as_pairs
    return np.broadcast_to(x, (ndim, 2)).tolist()
  File "<__array_function__ internals>", line 6, in broadcast_to
  File "/usr/local/lib/python3.6/dist-packages/numpy/lib/stride_tricks.py", line 182, in broadcast_to
    return _broadcast_to(array, shape, subok=subok, readonly=True)
  File "/usr/local/lib/python3.6/dist-packages/numpy/lib/stride_tricks.py", line 127, in _broadcast_to
    op_flags=['readonly'], itershape=shape, order='C')
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (2,2) and requested shape (1,2)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/tensorflow-tts-preprocess", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_tts/bin/preprocess.py", line 252, in main
    p.map(save_to_file, range(len(processor.items)))
  File "/usr/local/lib/python3.6/dist-packages/pathos/multiprocessing.py", line 137, in map
    return _pool.map(star(f), zip(*args)) # chunksize
  File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/local/lib/python3.6/dist-packages/multiprocess/pool.py", line 644, in get
    raise self._value
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (2,2) and requested shape (1,2)
dathudeptrai commented 4 years ago

@ZDisket hah, sorry :) f0 is a vector 1 dim :)). f0 = np.pad(f0, ((0, len(mel) - len(f0))))

ZDisket commented 4 years ago

@dathudeptrai This one worked well without any errors. Let me know when you commit the fix.

ZDisket commented 4 years ago

When trying to train, I still get

2 frames
<ipython-input-13-8616bad8c9dc> in dotrain(inargs, ptpath, maxsteps)
    371         energy_stat=args.energy_stat,
    372         mel_length_threshold=mel_length_threshold,
--> 373         return_utt_id=False
    374     ).create(
    375         is_shuffle=config["is_shuffle"],

/content/TensorflowTTS/ttsexamples/fastspeech2/fastspeech2_dataset.py in __init__(self, root_dir, charactor_query, mel_query, duration_query, f0_query, energy_query, f0_stat, energy_stat, max_f0_embeddings, max_energy_embeddings, charactor_load_fn, mel_load_fn, duration_load_fn, f0_load_fn, energy_load_fn, mel_length_threshold, return_utt_id)
    101             mel_files = [mel_files[idx] for idx in idxs]
    102             charactor_files = [charactor_files[idx] for idx in idxs]
--> 103             duration_files = [duration_files[idx] for idx in idxs]
    104             mel_lengths = [mel_lengths[idx] for idx in idxs]
    105             f0_files = [f0_files[idx] for idx in idxs]

/content/TensorflowTTS/ttsexamples/fastspeech2/fastspeech2_dataset.py in <listcomp>(.0)
    101             mel_files = [mel_files[idx] for idx in idxs]
    102             charactor_files = [charactor_files[idx] for idx in idxs]
--> 103             duration_files = [duration_files[idx] for idx in idxs]
    104             mel_lengths = [mel_lengths[idx] for idx in idxs]
    105             f0_files = [f0_files[idx] for idx in idxs]

IndexError: list index out of range

I remembered that durations aren't needed for FastSpeech 2 so I changed the duration query to *.raw-feats.npy in the training function and I got this:

InvalidArgumentError                      Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py in execution_mode(mode)
   1985       ctx.executor = executor_new
-> 1986       yield
   1987     finally:

14 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py in _next_internal(self)
    654             output_types=self._flat_output_types,
--> 655             output_shapes=self._flat_output_shapes)
    656 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_dataset_ops.py in iterator_get_next(iterator, output_types, output_shapes, name)
   2362     except _core._NotOkStatusException as e:
-> 2363       _ops.raise_from_not_ok_status(e, name)
   2364   # Add nodes to the TensorFlow graph.

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py in raise_from_not_ok_status(e, name)
   6652   # pylint: disable=protected-access
-> 6653   six.raise_from(core._status_to_exception(e.code, message), None)
   6654   # pylint: enable=protected-access

/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)

InvalidArgumentError: All elements in a batch must have the same rank as the padded shape for component1: expected rank 1 but got element with rank 2 [Op:IteratorGetNext]

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-22-5c3a831be7f3> in <module>()
     27   inpt = ""
     28 
---> 29 dotrain(trainargs,inpt,6000)

<ipython-input-21-83135adb8e50> in dotrain(inargs, ptpath, maxsteps)
    442                     valid_dataset,
    443                     saved_path=os.path.join(config["outdir"], 'checkpoints/'),
--> 444                     resume=args.resume)
    445     except KeyboardInterrupt:
    446         trainer.save_checkpoint()

/content/TensorflowTTS/ttsexamples/fastspeech/train_fastspeech.py in fit(self, train_data_loader, valid_data_loader, saved_path, resume)
    273             self.load_checkpoint(resume)
    274             logging.info(f"Successfully resumed from {resume}.")
--> 275         self.run()
    276 
    277 

/content/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py in run(self)
     70                          desc="[train]")
     71         while True:
---> 72             self._train_epoch()
     73 
     74             if self.finish_train:

/content/TensorflowTTS/tensorflow_tts/trainers/base_trainer.py in _train_epoch(self)
     90     def _train_epoch(self):
     91         """Train model one epoch."""
---> 92         for train_steps_per_epoch, batch in enumerate(self.train_data_loader, 1):
     93             # one step training
     94             self._train_step(batch)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py in __next__(self)
    629 
    630   def __next__(self):  # For Python 3 compatibility
--> 631     return self.next()
    632 
    633   def _next_internal(self):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py in next(self)
    668     """Returns a nested structure of `Tensor`s containing the next element."""
    669     try:
--> 670       return self._next_internal()
    671     except errors.OutOfRangeError:
    672       raise StopIteration

/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/iterator_ops.py in _next_internal(self)
    659         return self._element_spec._from_compatible_tensor_list(ret)  # pylint: disable=protected-access
    660       except AttributeError:
--> 661         return structure.from_compatible_tensor_list(self._element_spec, ret)
    662 
    663   @property

/usr/lib/python3.6/contextlib.py in __exit__(self, type, value, traceback)
     97                 value = type()
     98             try:
---> 99                 self.gen.throw(type, value, traceback)
    100             except StopIteration as exc:
    101                 # Suppress StopIteration *unless* it's the same exception that

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py in execution_mode(mode)
   1987     finally:
   1988       ctx.executor = executor_old
-> 1989       executor_new.wait()
   1990 
   1991 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/executor.py in wait(self)
     65   def wait(self):
     66     """Waits for ops dispatched in this executor to finish."""
---> 67     pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle)
     68 
     69   def clear_error(self):

InvalidArgumentError: All elements in a batch must have the same rank as the padded shape for component1: expected rank 1 but got element with rank 2
dathudeptrai commented 4 years ago

Hi, i didn’t use MFA to extract alignment :( still use duration from tacotron :(. I will fix a bug asap, i’m not at home right now, but i think a bug is ez to fix so let try to fix it :)), i didn’t meet ur problem.