Pedal-Intelligence / saypi-userscript

An independent voice interface for Inflection AI's conversational assistant, Pi
https://www.saypi.ai/
Other
15 stars 3 forks source link

Not speaking Asian #85

Closed rosscado closed 3 days ago

rosscado commented 1 week ago

Messages from Pi in Chinese are not read in full.

Languages with Persistent Issues

Chinese Example For example, on the following message, only the initial couple of characters, "哇,这" were streamed. The text input stream to the TTS engine being ended prematurely led to only a very short (monosyllabic) audio output.

哇,这个功能听起来真的很酷!🤩让用户有更多的自由度来选择自己喜欢的声音,可以更好地满足他们的个人需求。这样,他们可以使用让他们感到更舒适或更容易与之互动的声音。我确信这将是一个极具吸引力的功能,对于那些对声音有着高度个性化要求的用户来说。

saypi.user.js:38038 +0ms, streamed text: "哇,这"
saypi.user.js:37144 Streaming text began with "哇,这"
saypi.user.js:32108 Loading audio from https://api.saypi.ai/speak/583340d6-4386-45f2-81ff-6fb0faf01816/stream?voice_id=ig1TeITnnNlsJtfHxJlW&lang=zh
saypi.user.js:37211 Stream ended on 哇,这 after 2424ms of inactivity
saypi.user.js:37130 Clearing timeout on stream completion
saypi.user.js:38052 Text stream complete after 3.6 seconds
saypi.user.js:37985 Closed audio input stream 583340d6-4386-45f2-81ff-6fb0faf01816
saypi.user.js:37986 Streamed text: 哇,这

Japanese Example Where Pi's message was a follows, only "どういたしまして!" was streamed.

どういたしまして!😊今後も、ご要望や質問がございましたら、お気軽にお申し付けくださいね!

Opened audio input stream 11fed219-534e-46a0-8813-3cd22a348261
saypi.user.js:38038 +0ms, streamed text: "どういたしまして!"
saypi.user.js:37144 Streaming text began with "どういたしまして!"
saypi.user.js:32108 Loading audio from https://api.saypi.ai/speak/11fed219-534e-46a0-8813-3cd22a348261/stream?voice_id=ig1TeITnnNlsJtfHxJlW&lang=ja
saypi.user.js:37211 Stream ended on どういたしまして! after 2510ms of inactivity
saypi.user.js:37130 Clearing timeout on stream completion
saypi.user.js:38052 Text stream complete after 3.8 seconds
saypi.user.js:37985 Closed audio input stream 11fed219-534e-46a0-8813-3cd22a348261
saypi.user.js:37986 Streamed text: どういたしまして!

Languages with Intermittent Issues

Korean Example Korean is included here as an outlier. While it sometimes exhibits the same failure to read the full message, the underlying cause seems to be different to other East Asian languages. In particular, the full Korean text is streamed, but some of the sentence breaks are missing. This may be causing the incomplete reading, or it may be another cause.

Opened audio input stream 0712aa54-ffc3-494b-959e-91e799cf5cbe
saypi.user.js:38038 +0ms, streamed text: "아,"
saypi.user.js:37144 Streaming text began with "아,"
saypi.user.js:32108 Loading audio from https://api.saypi.ai/speak/0712aa54-ffc3-494b-959e-91e799cf5cbe/stream?voice_id=ig1TeITnnNlsJtfHxJlW&lang=ko
saypi.user.js:38038 +1520ms, streamed text: "그렇군요!"
saypi.user.js:38038 +3452ms, streamed text: "그러면, "
...
saypi.user.js:38038 +4181ms, streamed text: "있습니다."
saypi.user.js:32116 Audio is loaded after 4.1s from https://api.saypi.ai/speak/0712aa54-ffc3-494b-959e-91e799cf5cbe/stream?voice_id=ig1TeITnnNlsJtfHxJlW&lang=ko
saypi.user.js:32121 Audio is ready to play after 4.1s from https://api.saypi.ai/speak/0712aa54-ffc3-494b-959e-91e799cf5cbe/stream?voice_id=ig1TeITnnNlsJtfHxJlW&lang=ko
saypi.user.js:32086 Playing audio from https://api.saypi.ai/speak/0712aa54-ffc3-494b-959e-91e799cf5cbe/stream?voice_id=ig1TeITnnNlsJtfHxJlW&lang=ko
saypi.user.js:32126 Audio is ready to play through after 4.1s from https://api.saypi.ai/speak/0712aa54-ffc3-494b-959e-91e799cf5cbe/stream?voice_id=ig1TeITnnNlsJtfHxJlW&lang=ko
saypi.user.js:37211 Stream ended on 있습니다. after 2661ms of inactivity
saypi.user.js:37130 Clearing timeout on stream completion
saypi.user.js:38377 Hash mismatch: aca8fa84df01791e239f59e006811b48 vs 73d8ce1234dee45f87ef3a519eb670e4
saypi.user.js:37211 Stream ended on 있습니다. after 2707ms of inactivity
saypi.user.js:37130 Clearing timeout on stream completion
saypi.user.js:38052 Text stream complete after 8.2 seconds
saypi.user.js:37985 Closed audio input stream 0712aa54-ffc3-494b-959e-91e799cf5cbe
Streamed text:  아,그렇군요!그러면, 저는 제가 그 영상의 전체 메시지를 이해하기 위해 더 집중해야 할 것 같네요.그 영상은 재미있었지만, 아마 다른 의미가 있었을 수도 있습니다.
Assistant text: 아, 그렇군요! 그러면, 저는 제가 그 영상의 전체 메시지를 이해하기 위해 더 집중해야 할 것 같네요. 그 영상은 재미있었지만, 아마 다른 의미가 있었을 수도 있습니다.

Arabic Example Arabic exhibits the same whitespace issue as Korean, but in this case it doesn't prevent audio synthesis. It does cause a problem for the hashing algorithm, such that messages and charges cannot be saved to the chat history correctly.

Streamed text: "نعمهذه المشكلة الأساسية التي أدى إلى تغييرات كبيرة في التدريب، وبالتالي في نتائج التدريب.إنه شيء مشجع أن يتم الحصول على نتائج جيدة مع العربية.أتساءل أيضاً إذا هناك أي مشاكل أخرى التي تستوعبها؟"
Assistant text:  "نعم، هذه المشكلة الأساسية التي أدى إلى تغييرات كبيرة في التدريب، وبالتالي في نتائج التدريب. إنه شيء مشجع أن يتم الحصول على نتائج جيدة مع العربية. أتساءل أيضاً إذا هناك أي مشاكل أخرى التي تستوعبها؟"

Ukrainian Example Ukrainian exhibits the same sentence separator problem as Arabic and Korean, leading to hashing problems for saving. And it sometimes doesn't speak the full audio.

Streamed text: "Це дійсночудово!Я також відчуваю глибоку симпатію до України і українського народу.Надіюсь, що цей включення допоможе створити ще більше знайомств та зв'язків між нами.Слава Україні! 🇺"
Assistant text:  "Це дійсно чудово! Я також відчуваю глибоку симпатію до України і українського народу. Надіюсь, що цей включення допоможе створити ще більше знайомств та зв'язків між нами. Слава Україні! 🇺🇦"

Russian Example As might be expected for another Cyrillic language, Russian exhibits the same sentence separator problem as Ukrainian, Arabic, and Korean causing problems saving audio to the history. Messages are only sometimes read in full.

rosscado commented 1 week ago

Languages Affected:

Not Affected:

Not Tested:

rosscado commented 3 days ago

Closed with 66f2f8b877379eac5eb0acbe63d6e3508688c65d