echogarden-project / echogarden

Easy-to-use speech toolset. Written in TypeScript. Includes tools for synthesis, recognition, alignment, speech translation, language detection, source separation and more.
GNU General Public License v3.0
180 stars 17 forks source link

Error: Word end marker for index 6 is not consistent with word index. #57

Open an-lee opened 4 months ago

an-lee commented 4 months ago

An error occured when using align command.

➜  echogarden git:(main) echogarden align ~/Downloads/0515_CHI_ko_3.mp3 ~/Downloads/0515_CHI_ko_3.txt                                                                                                                                                                                                                                                                  npm10.5.2
Echogarden v1.5.0

Transcode with command-line ffmpeg.. 1307.3ms
Convert wave buffer to raw audio.. 3.3ms
Resample audio to 16kHz mono.. 195.3ms
Crop using voice activity detection.. 121.8ms
Normalize and trim audio.. 19.4ms
No language specified. Detect language using reference text.. 183.6ms
Language detected: Korean (ko)
Load alignment module.. 0.2ms
Synthesize alignment reference with eSpeak.. Error: Word end marker for index 6 is not consistent with word index. The words were: [
  '예는',       '프로그램은',
  '매력적인',   '아이돌을',
  '통해',       '시청자를',
  '끌어들이며', ',',
  '이는',       '단순한',
  '오락을',     '넘어',
  '사람들이',   '더',
  '나은',       '삶을',
  '살도록',     '영감을',
  '줍니다',     '.'
]

Test audio & text

audio: 0515_CHI_ko_3.mp3 text: 0515_CHI_ko_3.txt

Thanks for your work!

rotemdan commented 3 months ago

Thanks. Sorry it took me a long time to get to this.

This appears to be an unreported eSpeak-ng bug that is particular to its Korean voice. A marker is omitted from the events when it appears before or after a comma character (,), or possibly other punctuation characters. That causes an inconsistency in the markers that produces the error.

I already have lots of workarounds for many different marker bugs.

I'll need to find a good workaround for this one. Seems like the standard one (like adding () before or after the marker), which works with virtually all the voices I've tried, doesn't work with the Korean voice.

an-lee commented 3 months ago

Thank you for your response and effort.

This seems to be a challenging task. Please take your time with it.

rotemdan commented 3 months ago

I didn't realize that the problem is common with Korean texts. I would definitely want to find a good workaround to include in the next release, but it seems not to be as straightforward as I thought.