coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
35.63k stars 4.36k forks source link

[Bug] If sentence too long, some part will be missing during audio file generation #1680

Closed hengway closed 2 years ago

hengway commented 2 years ago

Describe the bug

If a sentence too long (separate by comma) some part of it will missing during the audio generation

Example: On April 1, 1942, Desmond Doss joined the United States Army. Little did he realize that three and a half years later, he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire. Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor.

The missing part will be: he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire.

To work around, shorten the sentence by replace comma with full stop: On April 1, 1942, Desmond Doss joined the United States Army. Little did he realize that three and a half years later. He would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire. Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor.

To Reproduce

Run below command tts --text "On April 1, 1942, Desmond Doss joined the United States Army. Little did he realize that three and a half years later, he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire. Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor. " --model_name "tts_models/en/ljspeech/tacotron2-DDC_ph" --out_path /var/data/The-unlikely-hero5.wav

Expected behavior

Able to generate whole audio file

Logs

ubuntu@ubuntu:~$ tts --text "On April 1, 1942, Desmond Doss joined the United States Army. Little did he realize that three and a half years later, he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire. Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor. " --model_name "tts_models/en/ljspeech/tacotron2-DDC_ph" --out_path /opt/tts_output/The-unlikely-hero5.wav
 > tts_models/en/ljspeech/tacotron2-DDC_ph is already downloaded.
 > vocoder_models/en/ljspeech/univnet is already downloaded.
 > Using model: Tacotron2
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:50.0
 | > mel_fmax:7600.0
 | > pitch_fmin:0.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:/home/xstts/.local/share/tts/tts_models--en--ljspeech--tacotron2-DDC_ph/scale_stats.npy
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > Model's reduction rate `r` is set to: 2
 > Vocoder Model: univnet
 > Setting up Audio Processor...
 | > sample_rate:22050
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:50.0
 | > mel_fmax:7600.0
 | > pitch_fmin:0.0
 | > pitch_fmax:640.0
 | > spec_gain:1.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:60
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:/home/xstts/.local/share/tts/vocoder_models--en--ljspeech--univnet/scale_stats.npy
 | > base:10
 | > hop_length:256
 | > win_length:1024
 > Generator Model: univnet_generator
 > Discriminator Model: univnet_discriminator
 > Text: On April 1, 1942, Desmond Doss joined the United States Army. Little did he realize that three and a half years later, he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire. Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor.
 > Text splitted to sentences.
['On April 1, 1942, Desmond Doss joined the United States Army.', 'Little did he realize that three and a half years later, he would be standing on the White House lawn, receiving the nations highest award for his bravery and courage under fire.', 'Of the 16 million men in uniform during World War 2, only 431 received the Congressional Medal of Honor.']
ɔn eɪpɹəl wʌn, naɪntin fɔɹti tu, dɛzmənd dɔs d͡ʒɔɪnd ðə junaɪtɪd steɪts ɑɹmi.
 [!] Character '͡' not found in the vocabulary. Discarding it.
[W NNPACK.cpp:51] Could not initialize NNPACK! Reason: Unsupported hardware.
 > Processing time: 18.15455675125122
 > Real-time factor: 0.9681247735486627
 > Saving output to /opt/tts_output/The-unlikely-hero5.wav

Environment

Package                Version              Location
---------------------- -------------------- --------
anyascii               0.3.1
appdirs                1.4.4
astroid                2.7.3
attrs                  19.3.0
audioread              2.1.9
Automat                0.8.0
Babel                  2.10.3
backports.zoneinfo     0.2.1
black                  22.3.0
blinker                1.4
bokeh                  1.4.0
certifi                2019.11.28
cffi                   1.15.0
chardet                3.0.4
click                  8.1.3
cloud-init             22.2
colorama               0.4.3
command-not-found      0.3
configobj              5.0.6
constantly             15.1.0
coqpit                 0.0.16
coverage               6.4.1
cryptography           2.8
cycler                 0.11.0
Cython                 0.29.28
dateparser             1.1.1
dbus-python            1.2.16
decorator              5.1.1
distro                 1.4.0
distro-info            0.23ubuntu1
docopt                 0.6.2
entrypoints            0.3
Flask                  2.1.2
fonttools              4.33.3
fsspec                 2022.5.0
gruut                  2.2.3
gruut-ipa              0.13.0
gruut-lang-cs          2.0.0
gruut-lang-de          2.0.0
gruut-lang-en          2.0.0
gruut-lang-es          2.0.0
gruut-lang-fr          2.0.2
gruut-lang-it          2.0.0
gruut-lang-nl          2.0.2
gruut-lang-pt          2.0.0
gruut-lang-ru          2.0.0
gruut-lang-sv          2.0.0
httplib2               0.14.0
hyperlink              19.0.0
idna                   2.8
importlib-metadata     4.11.4
importlib-resources    5.8.0
incremental            16.10.1
inflect                5.6.0
isort                  5.10.1
itsdangerous           2.1.2
jieba                  0.42.1
Jinja2                 3.1.2
joblib                 1.1.0
jsonlines              1.2.0
jsonpatch              1.22
jsonpointer            2.0
jsonschema             3.2.0
keyring                18.0.1
kiwisolver             1.4.3
language-selector      0.1
launchpadlib           1.10.13
lazr.restfulclient     0.14.2
lazr.uri               1.0.3
lazy-object-proxy      1.7.1
librosa                0.8.0
llvmlite               0.38.1
MarkupSafe             2.1.1
matplotlib             3.5.2
mccabe                 0.6.1
mecab-python3          1.0.5
more-itertools         4.2.0
mypy-extensions        0.4.3
netifaces              0.10.4
networkx               2.8.4
nose2                  0.11.0
num2words              0.5.10
numba                  0.55.1
numpy                  1.21.6
oauthlib               3.1.0
packaging              21.3
pandas                 1.4.2
pathspec               0.9.0
pexpect                4.6.0
Pillow                 9.1.1
pip                    20.0.2
platformdirs           2.5.2
pooch                  1.6.0
protobuf               3.19.4
pyasn1                 0.4.2
pyasn1-modules         0.2.1
pycparser              2.21
PyGObject              3.36.0
PyHamcrest             1.9.0
PyJWT                  1.7.1
pylint                 2.10.2
pymacaroons            0.13.0
PyNaCl                 1.3.0
pynndescent            0.5.7
pyOpenSSL              19.0.0
pyparsing              3.0.9
pypinyin               0.46.0
pyrsistent             0.15.5
pysbd                  0.3.4
pyserial               3.4
python-apt             2.0.0+ubuntu0.20.4.7
python-crfsuite        0.9.8
python-dateutil        2.8.2
python-debian          0.1.36ubuntu1
pytz                   2022.1
pytz-deprecation-shim  0.1.0.post0
pyworld                0.2.10
PyYAML                 5.3.1
regex                  2022.3.2
requests               2.22.0
requests-unixsocket    0.2.0
resampy                0.2.2
scikit-learn           1.1.1
scipy                  1.8.1
SecretStorage          2.3.1
service-identity       18.1.0
setuptools             45.2.0
simplejson             3.16.0
six                    1.14.0
sos                    4.3
SoundFile              0.10.3.post1
ssh-import-id          5.10
systemd-python         234
tensorboardX           2.5.1
threadpoolctl          3.1.0
toml                   0.10.2
tomli                  2.0.1
torch                  1.11.0
torchaudio             0.11.0
tornado                6.1
tqdm                   4.64.0
trainer                0.0.12
TTS                    0.7.0                /opt/TTS
Twisted                18.9.0
typing-extensions      4.2.0
tzdata                 2022.1
tzlocal                4.2
ubuntu-advantage-tools 27.8
ufw                    0.36
umap-learn             0.5.1
unattended-upgrades    0.1
unidic-lite            1.0.8
urllib3                1.25.8
wadllib                1.3.3
Werkzeug               2.1.2
wheel                  0.34.2
wrapt                  1.12.1
zipp                   3.8.0
zope.interface         4.7.1

Additional context

No response

erogol commented 2 years ago

For Tacotron models there is a cap of 250 chars not to crash your memory. You need to set it manually if you wanna change it.

genglinxiao commented 1 year ago

I'm also looking for methods to generate long sentences. What I've found is, the limit is actually in the tokenizer, and is hard coded:

class VoiceBpeTokenizer: def __init__(self, vocab_file=None): self.tokenizer = None if vocab_file is not None: self.tokenizer = Tokenizer.from_file(vocab_file) self.char_limits = { "en": 250, "de": 253, "fr": 273, "es": 239, "it": 213, "pt": 203, "pl": 224, "zh-cn": 82, "ar": 166, "cs": 186, "ru": 182, "nl": 251, "tr": 226, "ja": 71, "hu": 224, "ko": 95, }

So you can simply modify the limit. However, I'm not sure about the downstream effect.

FurkanGozukara commented 1 year ago

For Tacotron models there is a cap of 250 chars not to crash your memory. You need to set it manually if you wanna change it.

what is limit for TTS V2? I saw in code 400 tokens

m000lie commented 8 months ago

For Tacotron models there is a cap of 250 chars not to crash your memory. You need to set it manually if you wanna change it.

how much memory is it expected to use per char? i have access to 1x H100 SCM 80GB. surely memory shouldn't be a problem right?

OlegRuban-ai commented 2 weeks ago

@genglinxiao is there a way to make these changes inside the code installed with PIP without having to clone the repository?