espeak-ng / espeak-ng-ios-app

Other
30 stars 6 forks source link

Noise artifacts between batches of text #7

Closed parhamdoustdar closed 2 years ago

parhamdoustdar commented 2 years ago

Right now, when VoiceOver speaks text that is in batches, there is a very short chunk of the last text it spoke in the middle.

This might be hard to explain, so let me share the steps to reproduce;

  1. With the espeak voice selected, lock your phone
  2. Swipe right until you land on "Show notifications" button
  3. Notice that you can hear a small part of the "s" in between "show notifications" and "button"

Let me know if this is only happening for me, so I can share an audio/screen recording.

amirsol81 commented 2 years ago

@parhamdoustdar I can't duplicate this with the language set to Persian and the voice set to Max. I remember having encountered this issue on Windows few months ago with other ESpeak voices which are rather echoey, but Max seems to be unaffected.

parhamdoustdar commented 2 years ago

Good point, I should have shared which voice I'm using. I have been able to reproduce this with Max and English/Persian both. Also, one thing of note is that I'm using iOS 16.1, not the latest beta.

On Mon, Oct 31, 2022 at 11:23 AM amirsol81 @.***> wrote:

@parhamdoustdar https://github.com/parhamdoustdar I can't duplicate this with the language set to Persian and the voice set to Max. I remember having encountered this issue on Windows few months ago with other ESpeak voices which are rather echoey, but Max seems to be unaffected.

— Reply to this email directly, view it on GitHub https://github.com/espeak-ng/espeak-ng-ios-app/issues/7#issuecomment-1296882990, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACWCGYQWEH6PGFVERPAI53WF6MYXANCNFSM6AAAAAARS4UP4I . You are receiving this because you were mentioned.Message ID: @.***>

amirsol81 commented 2 years ago

Interesting - I'm also using 16.1 but can't duplicate it.

parhamdoustdar commented 2 years ago

I tested this more and it seems like, on English with Max at least, it happens when you get passed the 80% rate threshold. It works fine up to and including 75%, but then, if you go up to 80% and above, eSpeak starts breaking up and the issue I mentioned above happens. Does that help reproduce the issue?

On Mon, Oct 31, 2022 at 11:34 AM amirsol81 @.***> wrote:

Interesting - I'm also using 16.1 but can't duplicate it.

— Reply to this email directly, view it on GitHub https://github.com/espeak-ng/espeak-ng-ios-app/issues/7#issuecomment-1296896136, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACWCG773VL5Y4BVSANTY33WF6OCFANCNFSM6AAAAAARS4UP4I . You are receiving this because you were mentioned.Message ID: @.***>

amirsol81 commented 2 years ago

That does it. I can reproduce it with 80% and beyond. I was using 65%.

XP-Fan commented 2 years ago

Yeah well I can reproduce this too and I somehow think it might be related to https://github.com/espeak-ng/espeak-ng-ios-app/issues/6. It seems like some buffer is not flushed properly, just a wild guess of mine though.

XP-Fan commented 2 years ago

I mean what is interesting here is that it mostly occurs when we have a bunch of text where VoiceOver then adds 5S pause and then it sends another bunch of text, which happens of course without having it interrupt the first one. It also happens when you listen to a string till the end and then to a second one. from what I could estimate now (however gotta add that I only had the iPhone's internal speakers so far) this does not occur if we swipe through stuff very fast so that we always interrupt the ongoing string purposefully. So this would confirm my theory.

paxcoder commented 2 years ago

Maybe it's related, I hear weird clicks when moving really fast, or even normally. It's so annoying that I stopped using espeak.

djphoenix commented 2 years ago

Can you check 1.0(6)?

amirsol81 commented 2 years ago

@djphoenix This is still duplicable with 1.0.6.

djphoenix commented 2 years ago

Oh... I will review it more precisely.

KevanGP commented 2 years ago

I am having the issue, I can go up to 76% using VoiceOver but at 77% and above the audio skips. Furthermore, the pronunciation of some things changes when the faster rates are used, vowels seem more enunciated. The word rate sounds more like "raaate" with the A lasting longer at faster rates.

Also there's a click played about a second after speech naturally stops on some voices. Using the latest beta 06.

It's interesting that in the eSpeak app, the voice works properly at faster speech rates. But with VoiceOver it has the choppy problem.

djphoenix commented 2 years ago

Good news that in next version I will share an settings like word rate and pitch between app and VoiceOver, so you can just leave VoiceOver rate to default and tune speed natively for eSpeak core.

nidza07 commented 2 years ago

Hello, I think that if you are planning to do this, you should give us a setting to choose whether we want to be able to change the rate from VoiceOver or from the app. There are advantages to both approaches.

With VoiceOver, blind people can change the rate quickly on the fly with the rotor gesture, so it's possible to make quick adjustments as you are reading specific things. However, on the other hand, the app offers much faster speeds, which is also quite benefitial.

djphoenix commented 2 years ago

The key problem is that VoiceOver adjusts speed via resampling an audio record, on the other side, eSpeak changes rate on synthesis stage. So eSpeak rate change is more clear. Unfortunately I haven't found a way to read VoiceOver setting to handle it properly in audio unit.

djphoenix commented 2 years ago

In the end you of course may change word rate in eSpeak, then adjust rate with VoiceOver (to lower or higher) and it will apply after synthesis as well.

nidza07 commented 2 years ago

Ah, got it, if that is how it will work this is completely fine, it works the same in the Android app, i.e. you can adjust the speech rate in the app too, but also via the screen reader.

I wrongly understood that the only way to change the rate will be via the app.

XP-Fan commented 2 years ago

Guess that one is unreleased yet? Or is Apple failing again?

Von: Yury Popov @. Gesendet: Mittwoch, 2. November 2022 22:11 An: espeak-ng/espeak-ng-ios-app @.> Cc: XP-Fan @.>; Comment @.> Betreff: Re: [espeak-ng/espeak-ng-ios-app] Noise artifacts between batches of text (Issue #7)

Good news that in next version I will share an settings like word rate and pitch between app and VoiceOver, so you can just leave VoiceOver rate to default and tune speed natively for eSpeak core.

— Reply to this email directly, view it on GitHubhttps://github.com/espeak-ng/espeak-ng-ios-app/issues/7#issuecomment-1301266992, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKEBHTLEOWZBWEEYWLYGQHDWGLKFTANCNFSM6AAAAAARS4UP4I. You are receiving this because you commented.Message ID: @.***>

djphoenix commented 2 years ago

Not released yet. I waiting for one PR merged in espeak-ng to fix #16.

XP-Fan commented 2 years ago

I see well I just wanted to be sure that you know in case you might have forgotten to push a publish button or something. Anyway another information: The current testflight also offers a version of ESpeak that works on my 2018 Intel Mac Mini, which is amazing.

djphoenix commented 2 years ago

Please check 1.0(7) for iOS

nidza07 commented 2 years ago

@djphoenix for me, with the latest release, now ESpeak doesn't appear in VoiceOver at all.

It happened in the past as well, but usually restarting the phone or opening the app again would fix it, but not this time.

djphoenix commented 2 years ago

Oh... I'm not sure how to fix it for everyone. For sure, I tried both to update app from previous app, and clean install - and for me it works for both cases.

Can you try completely remove an app, then install it from testflight, open it, and then check VoiceOver voice list?

nidza07 commented 2 years ago

Never mind, it just appeared. It seems to take a long time sometimes for ESpeak to appear after updating, but at least it works.

nidza07 commented 2 years ago

From some quick testing, about the original issue reported here, this seems to be much better, but you have to set the rate in the app to about 460. If I do this, no matter what I configure in VoiceOver as the rate, I don't experience this issue anymore.

XP-Fan commented 2 years ago

Well I gotta say this is cool. Way less issues, also it does no longer stutter when voiceover tries to chunk something, it’s awesome. You might wonder why I say way less issues: Well I just had like a minute to check, I’ll probably only know the full picture when I carry my AirPods Pro again. But so far it seems to be just perfect, gotta say though that I’ve now bumped the rate in ESpeak to the 900 WPM and left it at the original 50 in voiceover. I’ll go up in voiceover again when I have my AirPods as through the internal speaker it doesn’t make much sense to try and push it high, it will be hard to understand.

XP-Fan commented 2 years ago

Okay so now that I‘ve had more time I found that unfortunately, it seems like VoiceOver doesn’t really care much about the speed changes any longer, when I set ESpeak to the 900 WPM there aint much difference any more between 50 and 100 % speed in VoiceOver. This means that the 900 WPM is now a logical border. I’d appreciate it if it was a bit more than the double of that, like in NVDA.

djphoenix commented 2 years ago

@parhamdoustdar I think I should ask you as author of this thread - does this fixed for you now? Seems that latest build is not affected.

paxcoder commented 2 years ago

For me the issue is still reproducible. When I fixed weird VoiceOver behaviour on my end, it is still having the issues with the last phoneme added at the end, sadly.

djphoenix commented 2 years ago

@paxcoder does it reproducing in app and/or console tool? You can send an example phrase and settings so I may check it.

nidza07 commented 2 years ago

@paxcoder increase the speech rate in the app to at least 460, and if this is too fast for you, decrease it using VoiceOver's rotor afterwards. When I do this, the last phoneme is no longer added to the next spoken utterance. Can you reproduce it if you do this as well?

@djphoenix normally this can't be reproduced in the app, it happens only with VoiceOver, and even then, you need to have at least 2 different strings announced. For example, if VoiceOver says something like, rate, and you move to the next option which is pitch, you will first hear the last phoneme of the rate word which was previously spoken for a very short moment, and then the new word which is pitch. This happens only if VoiceOver is set to rate 80 or above, and seemingly as I said doesn't happen anymore if the app rate is 460 or above, in that case any rate with VoiceOver works fine.

nidza07 commented 2 years ago

A slight correction, this appears to work after 450, and not 460. So, 449 rate causes the issue, but after 450 it's fine.

djphoenix commented 2 years ago

Oh gosh... @nidza07 I think I can reproduce it now. That was unreproduceable for me because I was never raised word rate over 300wpm, it becomes absolutely not understandable for me. I really doesn't know how blind people uses screen readers with 400 wpm and over...

OK, now I know that here is a range where bug appears. The upper bound is 449, and the lower is over 300. The question is why...

KevanGP commented 2 years ago

Seems the problem got fixed! I now have VoiceOver set to 50 for speech rate, and I can turn the rate up in the eSpeak app as fast as I want without skipping!Just remember to set the speech rate setting in the eSpeak app, that's my best suggestion.This is better than some other text-to-speech synthesizers now with this fast non-skipping rate!On Nov 3, 2022, at 09:32, Yury Popov @.***> wrote: Not released yet. I waiting for one PR merged in espeak-ng to fix #16.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

tspivey commented 2 years ago

If I'm understanding this correctly, it looks like this is a bug in eSpeak:

The desired rate (e.g. the one set in VO) comes through in the SSML as a percentage. Example:

<speak><prosody rate="274.46155%"><break time="60ms" />            Untitled 8 <break time="60ms" />            window</prosody></speak>

eSpeak takes that and sets the rate based on the current rate. Rates above 449 activate sonic. My guess is there's a difference when activating it that way vs calling eSpeak functions.

This is easy to reproduce with just espeak-ng, running with espeak -m:

<speak><prosody rate="274.46155%"><break time="60ms" />            Untitled 8 <break time="60ms" />            window</prosody></speak>
<speak><prosody rate="274.46155%"><break time="1000ms" /> test</prosody></speak>

As soon as you hit enter on the second line, you'll hear a bit of audio at the beginning.

djphoenix commented 2 years ago

@tspivey thank you a lot for your sample!

It really clear now that there is a some issue in core library. I will track it (and maybe fix it myself, haha).

XP-Fan commented 2 years ago

@tspiveyhttps://github.com/tspivey sorry for this offtopicness but aren’t you one of the two guys behind NVDARemote? Now that we have 3rd-party TTS, wouldn’t it be possible to create an NVDARemote host app for macOS? I mean we could mod the macOS version of ESpeak to send the text strings to an NVDARemote host app and just write an app that understands keypresses coming from an NVDARemote client and performs them on macOS just like AnyDesk does for example. There is the keyboard python module that could do it even I think but I am myself unfortunately not coming from python, also I guess in Swift it would be more efficient.

djphoenix commented 2 years ago

Should be fixed in 1.0(9) for iOS and 1.0(4) for macOS. @XP-Fan @tspivey @nidza07 please check.

paxcoder commented 2 years ago

fixed, now to appstore it goes.""

Wysłane z iPhone'a

djphoenix commented 2 years ago

@paxcoder it was a tricky road haha. We are on a finishing straight.

beqabeqa473 commented 2 years ago

I confirm that this bug is fixed.