Closed atd closed 1 year ago
Describe the bug
I am trying to configure mimic3 with an es_ES voice without success in a Mark II I clone this repository (with git lfs) and copy voices/es_ES to /home/mycroft/.local/share/mycroft/mimic3/voices/
es_ES
voices/es_ES
/home/mycroft/.local/share/mycroft/mimic3/voices/
I also changed .config/mycroft/mycroft.conf with
.config/mycroft/mycroft.conf
"tts": { "module": "mimic3_tts_plug", "mimic3_tts_plug": { "voice": "es_ES/carlfm_low", "preloaded_cache": "/opt/mycroft/preloaded_cache/Mimic3" } }
This is the log with original en_UK voice
en_UK
Jan 28 17:29:16 localhost.localdomain python[11199]: DEBUG:mimic3_tts.tts:phonemes=[['s', 'ˈʌ', 'n'], ['l', 'ˈɑ', 's'], ['l', 'ˈɑ', 's'], ['d', 'ˈi', 'ʃ', 'ə', 'k', 'oʊ'], ['v', 'ˈi', 'n', 't', 'ɪ', 'n', 'ˈu', 'v'], ['‖']], ids=[1, 0, 23, 0, 5, 0, 44, 0, 20, 0, 4, 0, 18, 0, 5, 0, 33, 0, 23, 0, 4, 0, 18, 0, 5, 0, 33, 0, 23, 0, 4, 0, 10, 0, 5, 0, 15, 0, 42, 0, 36, 0, 17, 0, 21, 0, 4, 0, 27, 0, 5, 0, 15, 0, 20, 0, 24, 0, 40, 0, 20, 0, 5, 0, 26, 0, 27, 0, 4, 0, 3, 0, 3, 0, 4, 0, 2] Jan 28 17:29:16 localhost.localdomain python[11199]: DEBUG:mimic3_tts.voice:TTS settings: speaker-id=0, length-scale=1.0, nois e-scale=0.667, noise-w=0.8 Jan 28 17:29:16 localhost.localdomain python[11772]: [2023-01-28 17:29:16.656] [mimic3] [debug] Copied 77 phoneme id(s) from r equest Jan 28 17:29:16 localhost.localdomain python[11772]: [2023-01-28 17:29:16.656] [mimic3] [debug] Request phonemes or ids are al ready present Jan 28 17:29:16 localhost.localdomain python[11772]: [2023-01-28 17:29:16.656] [mimic3] [debug] Phoneme ids are already presen t Jan 28 17:29:16 localhost.localdomain python[11772]: [2023-01-28 17:29:16.656] [mimic3] [debug] Synthesizing audio with 77 pho neme id(s) Jan 28 17:29:16 localhost.localdomain python[11772]: [2023-01-28 17:29:16.656] [mimic3] [debug] Allocating tensors Jan 28 17:29:16 localhost.localdomain python[11772]: [2023-01-28 17:29:16.656] [mimic3] [debug] Running inference Jan 28 17:29:18 localhost.localdomain python[11772]: [2023-01-28 17:29:18.417] [mimic3] [debug] Inference complete Jan 28 17:29:18 localhost.localdomain python[11772]: [2023-01-28 17:29:18.417] [mimic3] [debug] Writing WAV file: /tmp/tmpht_i e02f*.wav Jan 28 17:29:18 localhost.localdomain python[11772]: [2023-01-28 17:29:18.958] [mimic3] [debug] Cleaning up Jan 28 17:29:18 localhost.localdomain python[11772]: [2023-01-28 17:29:18.958] [mimic3] [info] Real-time factor: 0.61173459654 50164 (infer=1.761351749, audio=2.8792743764172335) Jan 28 17:29:18 localhost.localdomain python[11772]: [2023-01-28 17:29:18.958] [mimic3] [info] Wrote /tmp/tmpht_ie02f*.wav Jan 28 17:29:18 localhost.localdomain python[11199]: DEBUG:mimic3_tts.voice:RTF: 0.40078302642928665 Jan 28 17:29:18 localhost.localdomain python[11199]: DEBUG:audio:Submitted TTS chunk 1/1 for session 01da150a-2613-471d-b4e0-4 04c363230a0: Son las las dieciocho veintinueve Jan 28 17:29:18 localhost.localdomain python[11199]: INFO:mycroft.util.log:Queued TTS chunk 1/1: file:///tmp/mycroft/cache/tts /mimic3_tts_plug/12d12a30f5fe5c86e943c3fcd13f3a89.wav (session=01da150a-2613-471d-b4e0-404c363230a0): Son las las dieciocho veintinueve
vs logs with es_ES voice
Jan 28 17:46:04 localhost.localdomain python[20848]: DEBUG:audio:Synthesizing: Ahora mismo son las las dieciocho cuar[44/1909$ is Jan 28 17:46:04 localhost.localdomain python[20848]: DEBUG:gruut.text_processor:No custom settings for language es_ES (es-es). Creating default settings. Jan 28 17:46:04 localhost.localdomain python[20848]: DEBUG:mycroft.util.log:Started TTS session 792fa7da-8f68-48ca-b514-3d5ab5 35fde7 Jan 28 17:46:04 localhost.localdomain python[20848]: DEBUG:gruut.utils:(es-es) couldn't import module gruut_lang_es Jan 28 17:46:04 localhost.localdomain python[20848]: DEBUG:gruut.utils:(es-es) searching [PosixPath('/home/mycroft/.config/gru ut'), PosixPath('/opt/mycroft-dinkum/.venv/lib/python3.8/site-packages/data')] for language file(s) Jan 28 17:46:04 localhost.localdomain python[20848]: DEBUG:mimic3_tts.tts:phonemes=[['‖']], ids=[1, 0, 3, 0, 3, 0, 4, 0, 2] Jan 28 17:46:04 localhost.localdomain python[20848]: DEBUG:mimic3_tts.voice:TTS settings: speaker-id=0, length-scale=1.0, nois e-scale=0.667, noise-w=0.8 Jan 28 17:46:04 localhost.localdomain python[21421]: [2023-01-28 17:46:04.431] [mimic3] [debug] Copied 9 phoneme id(s) from re quest Jan 28 17:46:04 localhost.localdomain python[21421]: [2023-01-28 17:46:04.431] [mimic3] [debug] Request phonemes or ids are al ready present Jan 28 17:46:04 localhost.localdomain python[21421]: [2023-01-28 17:46:04.431] [mimic3] [debug] Phoneme ids are already presen t Jan 28 17:46:04 localhost.localdomain python[21421]: [2023-01-28 17:46:04.431] [mimic3] [debug] Synthesizing audio with 9 phon eme id(s) Jan 28 17:46:04 localhost.localdomain python[21421]: [2023-01-28 17:46:04.431] [mimic3] [debug] Allocating tensors Jan 28 17:46:04 localhost.localdomain python[21421]: [2023-01-28 17:46:04.431] [mimic3] [debug] Running inference Jan 28 17:46:04 localhost.localdomain python[20851]: DEBUG:mycroft.util.log:Audio finished: 792fa7da-8f68-48ca-b514-3d5ab535fd e7 Jan 28 17:46:04 localhost.localdomain python[21421]: [2023-01-28 17:46:04.722] [mimic3] [debug] Inference complete Jan 28 17:46:04 localhost.localdomain python[21421]: [2023-01-28 17:46:04.722] [mimic3] [debug] Writing WAV file: /tmp/tmpgt12 jn6y*.wav Jan 28 17:46:04 localhost.localdomain python[21421]: [2023-01-28 17:46:04.781] [mimic3] [debug] Cleaning up Jan 28 17:46:04 localhost.localdomain python[21421]: [2023-01-28 17:46:04.781] [mimic3] [info] Real-time factor: 0.89491416020 50782 (infer=0.290918127, audio=0.3250793650793651) Jan 28 17:46:04 localhost.localdomain python[21421]: [2023-01-28 17:46:04.781] [mimic3] [info] Wrote /tmp/tmpgt12jn6y*.wav Jan 28 17:46:04 localhost.localdomain python[20848]: DEBUG:mimic3_tts.voice:RTF: 0.546490836037794 Jan 28 17:46:04 localhost.localdomain python[20848]: DEBUG:audio:Submitted TTS chunk 1/1 for session 792fa7da-8f68-48ca-b514-3 d5ab535fde7: Ahora mismo son las las dieciocho cuarenta y seis Jan 28 17:46:04 localhost.localdomain python[20848]: INFO:mycroft.util.log:Queued TTS chunk 1/1: file:///tmp/mycroft/cache/tts /mimic3_tts_plug/afc83eb3ed54403bed497929e585eb2d.wav (session=792fa7da-8f68-48ca-b514-3d5ab535fde7): Ahora mismo son las las dieciocho cuarenta y seis
Seems like phonemes are not correctly generated?
Expected behavior
I should hear the Spanish voice of the audio. Hear nothing
Environment (please complete the following information):
Running pip install mycroft-plugin-tts-mimic3[en,es] installed required gruut package
pip install mycroft-plugin-tts-mimic3[en,es]
gruut
See https://community.mycroft.ai/t/mark-ii-language-change/13221/4
Describe the bug
I am trying to configure mimic3 with an
es_ES
voice without success in a Mark II I clone this repository (with git lfs) and copyvoices/es_ES
to/home/mycroft/.local/share/mycroft/mimic3/voices/
I also changed
.config/mycroft/mycroft.conf
withThis is the log with original
en_UK
voicevs logs with
es_ES
voiceSeems like phonemes are not correctly generated?
Expected behavior
I should hear the Spanish voice of the audio. Hear nothing
Environment (please complete the following information):