ImperialSquid / zotero-zotts

A Zotero plugin adding text to speech (TTS) functionality to various screens
GNU Affero General Public License v3.0
86 stars 1 forks source link

[Bug]: Pause doesn't work on Arch #115

Closed v4u6h4n closed 1 week ago

v4u6h4n commented 1 month ago

Checklist

Zotero version

7.0.7

ZoTTS version

1.3.0

OS

Linux

OS (specific)

arch hyprland

Steps to reproduce

  1. Followed installation steps outlined in the readme.
  2. Error popup says failed to start TTS engine.
  3. Restarting and reinstalling zotts and zotero has no effect.

Expected behaviour

Read me a bedtime story :-)

Actual behaviour

No bedtime story was read :-(

ImperialSquid commented 1 month ago

Hi, thanks for your bug report.

v4u6h4n commented 1 month ago

Hello :-)

It was the dependencies, thanks for the help.

But maybe I'll keep this issue open, because I've noticed the pause button isn't working. Everything else seems to working fine. I tried the ctrl + shift + p shortcut as well, and doesnt work with that either. Any suggestions?

ImperialSquid commented 1 month ago

Awesome, glad it's working now!

Not being able to pause is new to me, but ZoTTS uses whatever you have installed so it may be an issue with that software instead.

Same as the last step above, navigate to the error console and paste in the following:

var utt = new window.SpeechSynthesisUtterance("Long test sentence foo bar baz")
utt.onstart = () => {console.log("start")}
utt.onend = () => {console.log("end")}
utt.onerror = (error) => {console.log(error.error)}
window.speechSynthesis.speak(utt)
setTimeout(() => {window.speechSynthesis.pause()}, 100)

See if that brings up any error codes

v4u6h4n commented 1 month ago

Not sure what I'm looking at, are those eval codes error codes? This is the output:

116 start debugger eval code:2:30 end debugger eval code:3:28

I installed speech-dispatcher, piper-tts-bin from aur and a couple voices in case that's relevant. I updated and restarted just in case, and yeah the pause still isn't working.

ImperialSquid commented 1 month ago

They're not important, I was just testing something[1]

Given that it both starts and ends, but doesn't print an error statement, it's not that it's registering the pause and throwing an error (but continuing to speak), it's just not registering the pause at all, very strange...

Could you try uninstalling piper and installing something else like espeak or festival? That way we can test whether it's a piper problem or more widespread


[1] if you're curious, the first number is the ID for the pause timeout (you can use it to cancel the timeout, etc). And everything other than "start" and "end" is the console telling you what code gave that output, 2:30 corresponds to character 30 on line 2

ImperialSquid commented 1 month ago

Actually, espeak might not work (going off that it's not in the wiki below), so festival would probably be better. Either is fine.

Make sure to follow the installation instructions from the speech-dispatcher wiki, there's some steps that are specific to both festival and piper that you might need to do to make sure it works properly. (It may just be that doing the piper config steps fixes piper...?)

Sorry, I'm aware that's a lot of info, thanks for working with me to solve this!

v4u6h4n commented 4 weeks ago

Hello again, apologies for the delay, real-life housing issues were distracting me from the REAL important things, like having zotero read me esoteric tomes.

But back again now. I reinstalled piper, and ran through the spd-conf steps in the arch wiki, but I still haven't had any luck with the pause button. Quite frustrating because the piper voice sounds so good.

Which synthesizer are you using?

ImperialSquid commented 4 weeks ago

Yes, glad to have you back so we can sort the important issues and not those silly distractions like having a roof over your head lol

Ngl, that's incredibly frustrating news that it's still not working for you...

I'm actually technically not using a synthesiser in ZoTTS[1]

I'm not sure if I've asked but does pausing work on other applications? (eg using Firefox's Reader Mode as mentioned in the speech dispatcher wiki)


[1] For the behind the curtain details: since Zotero is built off Firefox, it has all sorts of browser APIs built in, one of which is the Web Speech API (hence why one of the preference headings is Web Speech, that's the "engine"), but really what's happening is that the browser is making calls out to whatever TTS you have installed locally (hence why speech dispatcher is needed, it acts as yet another intermediary).

The main advantages being it works out of the box for most people ("most people" here meaning "non Linux users" according to my bug reports lol), it's reasonably ok quality, and you don't run into needing to pay/manage quotas/etc. (I'm currently working on Azure TTS though, so people will have more options if they want them).

v4u6h4n commented 4 weeks ago

I just tried the firefox reader and it seems to pause fine; it starts from the beginning of the current section being read once it is resumed, but I think that's how the pause-play works, because it doesn't restart from the beginning of the article.

Okay, I think I understand what you mean regarding firefox api... can I technically not use a synthesizer as well to see if that fixes my issue? But I guess that's also knid of sweeping this issue under the rug >_>

ImperialSquid commented 4 weeks ago

Actually, I think I've just solved it. I went to have a look at the Firefox source to see if it makes any reference to speech dispatcher and found this, from the comment:

Speech dispatcher does not pause immediately, but waits for the speech to reach an index mark so that it could resume from that offset. There is no support for word or sentence boundaries, so index marks would only occur in explicit SSML marks, and we don't support that yet. What in actuality happens, is that if you call spd_pause(), it will speak the utterance in its entirety, dispatch an end event, and then put speechd in a 'paused' state. Since it is after the utterance ended, we don't get that state change, and our speech api is in an unrecoverable state. So, since it is useless anyway, I am not implementing pause.

So the take away is not that speech dispatcher can't pause, it's that it will only ever pause on SSML <mark> tags [1], which don't exist in the text.

My guess would be that behind the scenes, Firefox is chunking up the text into sections and keeping an internal queue, when a section gets "paused", it's actually getting cancelled and then replayed later when you "resume".

It looks like SSML works in speech dispatcher, and is planned but unfortunately not yet implemented in Piper, so while ZoTTS could inject <mark> tags, I don't know if they would be understood yet...

I have a few possible ideas for how to fix this (and luckily, am planning to dual boot my computer with Linux soon so I'll be able to test them out myself), so if you're happy leaving this with me, I can work on a solution and get back to you.

Sorry I don't have an immediate fix...!


[1] SSML is "Speech Synthesis Markup Language", it allows you to fine tune your control of the output, <mark> tags are empty tags purely used for behind the scenes stuff like triggering events, they don't affect output.

ImperialSquid commented 4 weeks ago

can I technically not use a synthesizer as well to see if that fixes my issue?

Sure, so long as it's compatible with speech dispatcher (as it looks like most engines are), it should (should) work with ZoTTS!

I'll continue working on the Piper thing in any case.

v4u6h4n commented 3 weeks ago

Oh awesome! Glad you've figured it out :-) it will be sooo good to finally use it. I'd been using TTS for listening to articles and ebooks for years, but left that behind when I adopted Zotero, so being able to finally use Zotero and TTS together will be a learning revolution (^o^)/

Sure, so long as it's compatible with speech dispatcher (as it looks like most engines are), it should (should) work with ZoTTS!

How would I go about doing that?

am planning to dual boot my computer with Linux soon

Oh your running windows? clicks on profile picture I'm shocked such a flawless manicure can be cultivated outside of Linux!

github-actions[bot] commented 1 week ago

:rocket: This ticket has been resolved in v1.4.0. See Release v1.4.0 for release notes.

ImperialSquid commented 1 week ago

Hi @v4u6h4n, apologies for the slight delay.

You should be able to pause speech in Linux now!

(Just fyi, it'll only pause at the end of sentences, not immediately.)

The solution is slightly hacked together since I ended up having to manually split the sentences in a block of text and then speak them one by one. I did a fair amount of testing, so there should be no glaring issues, but do let me know if you run into any problems!

v4u6h4n commented 1 week ago

ImperialSquid! You are amazing! Thankyou :-)

Ah I see what you mean, the pause being delayed until the end of the sentence isn't ideal, but for me I'm happy with that, so appreciate the hackiness.

Oddly enough I've found that when I use playerctl to play/pause the playback (playerctl --player playerctld play), it actually pauses and resumes to the most recent word played.

ImperialSquid commented 1 week ago

Happy to help 😁

Oddly enough I've found that when I use playerctl to play/pause the playback (playerctl --player playerctld play), it actually pauses and resumes to the most recent word played.

Huh, very interesting... Do you mind sharing further details about your setup?

v4u6h4n commented 1 week ago

Yeah of course, just not sure what details to provide besides: arch linux and still using piper; I think playerctl just comes with arch, but maybe I installed it at a previous date and have forgotten, which is common, it just uses the MPRIS d-bus thingy to interface with playback changes as far as I know, and the playerctld is the daemon component that I use for remembering the previous player that was used when resuming playback.

ImperialSquid commented 1 week ago

Sure sorry, some more detailed questions then:

As mentioned previously, I'm not a Linux native lol, so just trying to gather as much info as possible on what's best practice/what works for people/what doesn't/etc. That way I can make my code better, or write up some docs for other people in case they get lost in the future, that kind of thing. Windows and Mac do a pretty thorough job abstracting these considerations away from people so normally it's a case of just pointing at existing docs for them, whereas Linux is incredibly varied by comparison lol!

v4u6h4n commented 1 week ago

I'm just running the command via command line, well technically through a bash script, but results are the same. It just sends commands to MPRIS, which I think is doing the real work, and I'm guessing it's interfacing with zotts via firefox because its a supported client?

I don't have any custom configs for playerctl, or at least I don't think so, sorry I have a disociative disorder so I sometimes write code and forget I did it. I may have had to create the playerctld config file (outlined in the link to MPRIS), buuut I just tested using playerctl pause and playerctl play and it works the same, so I think you can ignore playerctld commands, as I think it's just for switching between multiple players, and just uses playerctl for the actual MPRIS playback commands anyway.

I'm not a Linux native lol

Never too late to start ;-) lol

v4u6h4n commented 2 days ago

Just realised skipping forward/backward in the text also works via playerctl!

ImperialSquid commented 7 hours ago

Just realised skipping forward/backward in the text also works via playerctl!

Huh, cool!

Unfortunately, I don't think it's super useful for ZoTTS. ZoTTS uses the Web Speech API in Firefox which doesn't have anything built-in for skipping forwards/backwards (only play/pause/resume/cancel)...

So if I wanted to use this I'd have to interact with the command line directly from Zotero, which is possible, but would also mean I need to reconstruct all the functionality WSA has already...

It might be something to revisit in the future when I do get around to implementing skipping, but for now I don't think it's super useful...

Thanks for letting me know though! 😊