brailcom / speechd

Common high-level interface to speech synthesis
GNU General Public License v2.0
219 stars 64 forks source link

Delimiter handling exception #931

Open shenghuang147 opened 4 months ago

shenghuang147 commented 4 months ago

Summary

The delimiter is not processed correctly when there is no space after the delimiter, as observed when using the spd-sya -w "hello,this,is,a,test" command.

Steps to Reproduce

Expected Behavior

The message should be split correctly at each delimiter, regardless of whether there is whitespace around the delimiter. For the input "hello,this,is,a,test", each fragment ("hello", "this", "is", "a", "test") should be returned in order.

Actual Behavior

The function does not segment the message correctly when there are no spaces around the delimiters. It behaves as if the delimiters are not present and returns the entire message as a single segment.

Importance of Fix

In many languages, it is not common to add spaces after punctuation marks, including commas. Addressing this issue is critical to ensure proper functioning across different language conventions and text formatting styles.

Log

epos-generic.log

 Sat Jun 15 02:30:17 2024 [635271]: Added voice zh-CN-XiaoxiaoNeural

 Sat Jun 15 02:30:17 2024 [635296]: Added voice zh-CN-XiaoxiaoNeural

 Sat Jun 15 02:30:17 2024 [635310]: Added voice zh-CN-XiaoyiNeural

 Sat Jun 15 02:30:17 2024 [635351]: Configuration (pre) has been read from "/etc/speech-dispatcher/modules/epos-generic.conf"

 Sat Jun 15 02:30:17 2024 [635369]: GenericMaxChunkLength = 300

 Sat Jun 15 02:30:17 2024 [635376]: GenericDelimiters = ,?!;

 Sat Jun 15 02:30:17 2024 [635383]: GenericExecuteSynth = printf %s '$DATA' | edge-tts --voice $VOICE --file /dev/stdin 2>/dev/null | play -t mp3 -q --ignore-length - 

 Sat Jun 15 02:30:17 2024 [635390]: GenericCmdDependency = printf

 Sat Jun 15 02:30:17 2024 [635397]: GenericPortDependency = 0

 Sat Jun 15 02:30:17 2024 [635548]: Generic: creating new thread for generic_speak

 Sat Jun 15 02:30:17 2024 [635724]: generic: speaking thread starting.......

 Sat Jun 15 02:30:17 2024 [635917]: Opening audio output system
 Sat Jun 15 02:30:17 2024 [636259]: Opening audio output system
 Sat Jun 15 02:30:17 2024 [721160]: Using pulse audio output method
 Sat Jun 15 02:30:18 2024 [178217]: speak()

 Sat Jun 15 02:30:18 2024 [178226]: Setting language zh-cn
 Sat Jun 15 02:30:18 2024 [178233]: Requested option by key zh-cn not found.

 Sat Jun 15 02:30:18 2024 [178241]: Setting voice type 1
 Sat Jun 15 02:30:18 2024 [178248]: There are no voices in the table for language=zh

 Sat Jun 15 02:30:18 2024 [178255]: Invalid voice type specified or no voice available!
 Sat Jun 15 02:30:18 2024 [178261]: Setting voice type 1
 Sat Jun 15 02:30:18 2024 [178268]: There are no voices in the table for language=zh

 Sat Jun 15 02:30:18 2024 [178274]: Invalid voice type specified or no voice available!
 Sat Jun 15 02:30:18 2024 [178285]: Volume: 100
 Sat Jun 15 02:30:18 2024 [178292]: HVolume: 100.000000
 Sat Jun 15 02:30:18 2024 [178304]: In stripping ssml: |Hello,this is a test,does it work?|
 Sat Jun 15 02:30:18 2024 [178316]: Requested data (0): |Hello,this is a test,does it work?|

 Sat Jun 15 02:30:18 2024 [178327]: Generic: leaving write() normally

 Sat Jun 15 02:30:18 2024 [178333]: Semaphore on

 Sat Jun 15 02:30:18 2024 [178536]: Entering parent process, closing pipes
 Sat Jun 15 02:30:18 2024 [178578]:   Looping...

 Sat Jun 15 02:30:18 2024 [178588]: Returned 34 bytes from get_part

 Sat Jun 15 02:30:18 2024 [178597]: Sending buf to child:|Hello,this is a test,does it work?| 34

 Sat Jun 15 02:30:18 2024 [178606]: going to write 34 bytes
 Sat Jun 15 02:30:18 2024 [178618]: written 34 bytes
 Sat Jun 15 02:30:18 2024 [178627]: Waiting for response from child...

 Sat Jun 15 02:30:18 2024 [178713]: Starting child...

 Sat Jun 15 02:30:18 2024 [178740]: UnBlocking user signal
 Sat Jun 15 02:30:18 2024 [178752]: Entering child loop

 Sat Jun 15 02:30:18 2024 [178762]: read 34 bytes in child
 Sat Jun 15 02:30:18 2024 [178770]: text read is: |Hello,this is a test,does it work?|

 Sat Jun 15 02:30:18 2024 [178807]: child: escaped text is |Hello,this is a test,does it work?|
 Sat Jun 15 02:30:18 2024 [178815]: child: synth command = |set -o pipefail ; printf %s 'Hello,this is a test,does it work?' | edge-tts --voice zh-CN-XiaoxiaoNeural --file /dev/stdin 2>/dev/null | play -t mp3 -q --ignore-length - |
 Sat Jun 15 02:30:18 2024 [178822]: Speaking in child...
 Sat Jun 15 02:30:18 2024 [178828]: Blocking user signal
 Sat Jun 15 02:30:24 2024 [694842]: subchild terminated -: exit?:1 status:0 signal?:0 signal number:0.

 Sat Jun 15 02:30:24 2024 [694873]: UnBlocking user signal
 Sat Jun 15 02:30:24 2024 [694893]: child->parent: ok, send more data
 Sat Jun 15 02:30:24 2024 [694921]: Ok, received report to continue...

 Sat Jun 15 02:30:24 2024 [694934]:   Looping...

 Sat Jun 15 02:30:24 2024 [694944]: Returned -1 bytes from get_part

 Sat Jun 15 02:30:24 2024 [694952]: End of data in parent, closing pipes
 Sat Jun 15 02:30:24 2024 [694966]:  Sat Jun 15 02:30:24 2024 [694969]Waiting for child...: 
read 0 bytes in child
 Sat Jun 15 02:30:24 2024 [694982]: child: Pipe closed, exiting, closing pipes..

 Sat Jun 15 02:30:24 2024 [695008]: Child ended...

 Sat Jun 15 02:30:24 2024 [696141]: child terminated -: exit?:1 status:0 signal?:0 signal number:0.

 Sat Jun 15 02:30:29 2024 [872848]: generic: stop()

 Sat Jun 15 02:30:29 2024 [872863]: generic: close()
  
sthibaul commented 4 months ago

This seems to be coming from 22b3cdb36ff663ee3aae97b1635fd6b8837a11ec

sthibaul commented 4 months ago

Thanks for the precise report. AIUI we do want to keep the space requirement for the . case, otherwise we'd spuriously split sentences in e.g. numbers.

I'd say we want to add to module_get_message_part a dividers_nospace parameter whose processing does not require a subsequent space. And then the corresponding configuration options in the few modules that are using it, and a useful default value.

Note that module_get_message_part currently only processes in ascii, not utf-8, that's a separate concern that should be also easy to fix thanks to g_utf8_get_char and g_utf8_next_char

shenghuang147 commented 4 months ago

Thank you for your work, and I'm not sure if I should raise a question in this issue GenericMaxChunkLength seems to judge the length in bytes rather than characters, and I think it would be more appropriate to use characters. Also I found that with GenericMaxChunkLength enabled, when reading non-ascii text aloud, in some cases the last character of the text that should be read is lost.

I'll try to trigger this later and submit the logs

sthibaul commented 4 months ago

GenericMaxChunkLength seems to judge the length in bytes rather than characters, and I think it would be more appropriate to use characters

I don't think it's worth changing it: it's a very rough guess anyway.

in some cases the last character of the text that should be read is lost

Which version did you test with? Note that I fixed #806 recently

shenghuang147 commented 3 months ago

Which version did you test with? Note that I fixed #806 recently

I am very sorry, I have confirmed that this issue is not related to speechd, this problem comes from Okular.