coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
34.7k stars 4.21k forks source link

Precomputing F0s takes a long time #2665

Closed arnav-newzera closed 1 year ago

arnav-newzera commented 1 year ago

Describe the bug

Precomputing F0s while finetuning fast_pitch takes a lot of time. This is my config:

config = FastPitchConfig(
    run_name="fast_pitch_ljspeech",
    audio=audio_config,
    batch_size=32,
    eval_batch_size=16,
    num_loader_workers=20,
    num_eval_loader_workers=20,
    compute_input_seq_cache=True,
    compute_f0=True,
    f0_cache_path=os.path.join(output_path, "f0_cache"),
    run_eval=True,
    test_delay_epochs=-1,
    epochs=100,
    text_cleaner="english_cleaners",
    use_phonemes=True,
    phoneme_language="en-us",
    phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
    precompute_num_workers=20,
    print_step=50,
    print_eval=False,
    mixed_precision=False,
    max_seq_len=500000,
    output_path=output_path,
    datasets=[dataset_config],
)

this is my cpu specification:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          48
On-line CPU(s) list:             0-47
Thread(s) per core:              2
Core(s) per socket:              24
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD Ryzen Threadripper 3960X 24-Core Processor
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         2031.056
CPU max MHz:                     3800.0000
CPU min MHz:                     2200.0000
BogoMIPS:                        7600.68
Virtualization:                  AMD-V
L1d cache:                       768 KiB
L1i cache:                       768 KiB
L2 cache:                        12 MiB
L3 cache:                        128 MiB

gpu:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA TITAN RTX    Off  | 00000000:21:00.0 Off |                  N/A |
| 41%   44C    P8    24W / 280W |    809MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA TITAN RTX    Off  | 00000000:49:00.0 Off |                  N/A |
| 41%   33C    P8    11W / 280W |      0MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   1020266      C   python                            806MiB |
+-----------------------------------------------------------------------------+

am i doing something wrong? can i skip the precomputing f0s step? is it recommended? performance is important to me.

To Reproduce

Standard installation . then finetuning fast_pitch using a ljspeech format dataset.

Expected behavior

Maybe a faster execution

Logs

No response

Environment

the config as well as environment is given in the bug description

Additional context

No response

erogol commented 1 year ago

it's normal. if you skip it it'll compute it in the dataloader anyways.

phamkhactu commented 1 year ago

Hi @erogol, @arnav-newzera I've configured as config above when I set compute_f0=False, to avoid slow computing . I get error:

Traceback (most recent call last):
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/trainer/trainer.py", line 1591, in fit
    self._fit()
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/trainer/trainer.py", line 1544, in _fit
    self.train_epoch()
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/trainer/trainer.py", line 1309, in train_epoch
    _, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/trainer/trainer.py", line 1141, in train_step
    outputs, loss_dict_new, step_time = self._optimize(
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/trainer/trainer.py", line 1025, in _optimize
    outputs, loss_dict = self._model_train_step(batch, model, criterion)
  File "/home/tupk/anaconda3/envs/nlp/lib/python3.8/site-packages/trainer/trainer.py", line 970, in _model_train_step
    return model.train_step(*input_args)
  File "/home/tupk/tupk/nlp/custom/TTS/TTS/tts/models/forward_tts.py", line 723, in train_step
    outputs = self.forward(
  File "/home/tupk/tupk/nlp/custom/TTS/TTS/tts/models/forward_tts.py", line 636, in forward
    o_pitch_emb, o_pitch, avg_pitch = self._forward_pitch_predictor(o_en, x_mask, pitch, dr)
ValueError: not enough values to unpack (expected 3, got 2)

Maybe it comes from not having computed pitch files. How can I fix it?

Thank you