Closed v4u6h4n closed 1 week ago
Hi, thanks for your bug report.
var utt = new window.SpeechSynthesisUtterance("test")
utt.onstart = () => {console.log("start")}
utt.onend = () => {console.log("end")}
utt.onerror = (error) => {console.log(error.error)}
window.speechSynthesis.speak(utt)
Hello :-)
It was the dependencies, thanks for the help.
But maybe I'll keep this issue open, because I've noticed the pause button isn't working. Everything else seems to working fine. I tried the ctrl + shift + p shortcut as well, and doesnt work with that either. Any suggestions?
Awesome, glad it's working now!
Not being able to pause is new to me, but ZoTTS uses whatever you have installed so it may be an issue with that software instead.
Same as the last step above, navigate to the error console and paste in the following:
var utt = new window.SpeechSynthesisUtterance("Long test sentence foo bar baz")
utt.onstart = () => {console.log("start")}
utt.onend = () => {console.log("end")}
utt.onerror = (error) => {console.log(error.error)}
window.speechSynthesis.speak(utt)
setTimeout(() => {window.speechSynthesis.pause()}, 100)
See if that brings up any error codes
Not sure what I'm looking at, are those eval codes error codes? This is the output:
116 start debugger eval code:2:30 end debugger eval code:3:28
I installed speech-dispatcher
, piper-tts-bin
from aur and a couple voices in case that's relevant. I updated and restarted just in case, and yeah the pause still isn't working.
They're not important, I was just testing something[1]
Given that it both starts and ends, but doesn't print an error statement, it's not that it's registering the pause and throwing an error (but continuing to speak), it's just not registering the pause at all, very strange...
Could you try uninstalling piper and installing something else like espeak or festival? That way we can test whether it's a piper problem or more widespread
[1] if you're curious, the first number is the ID for the pause timeout (you can use it to cancel the timeout, etc). And everything other than "start" and "end" is the console telling you what code gave that output, 2:30 corresponds to character 30 on line 2
Actually, espeak might not work (going off that it's not in the wiki below), so festival would probably be better. Either is fine.
Make sure to follow the installation instructions from the speech-dispatcher wiki, there's some steps that are specific to both festival and piper that you might need to do to make sure it works properly. (It may just be that doing the piper config steps fixes piper...?)
Sorry, I'm aware that's a lot of info, thanks for working with me to solve this!
Hello again, apologies for the delay, real-life housing issues were distracting me from the REAL important things, like having zotero read me esoteric tomes.
But back again now. I reinstalled piper, and ran through the spd-conf
steps in the arch wiki, but I still haven't had any luck with the pause button. Quite frustrating because the piper voice sounds so good.
Which synthesizer are you using?
Yes, glad to have you back so we can sort the important issues and not those silly distractions like having a roof over your head lol
Ngl, that's incredibly frustrating news that it's still not working for you...
I'm actually technically not using a synthesiser in ZoTTS[1]
I'm not sure if I've asked but does pausing work on other applications? (eg using Firefox's Reader Mode as mentioned in the speech dispatcher wiki)
[1] For the behind the curtain details: since Zotero is built off Firefox, it has all sorts of browser APIs built in, one of which is the Web Speech API (hence why one of the preference headings is Web Speech, that's the "engine"), but really what's happening is that the browser is making calls out to whatever TTS you have installed locally (hence why speech dispatcher is needed, it acts as yet another intermediary).
The main advantages being it works out of the box for most people ("most people" here meaning "non Linux users" according to my bug reports lol), it's reasonably ok quality, and you don't run into needing to pay/manage quotas/etc. (I'm currently working on Azure TTS though, so people will have more options if they want them).
I just tried the firefox reader and it seems to pause fine; it starts from the beginning of the current section being read once it is resumed, but I think that's how the pause-play works, because it doesn't restart from the beginning of the article.
Okay, I think I understand what you mean regarding firefox api... can I technically not use a synthesizer as well to see if that fixes my issue? But I guess that's also knid of sweeping this issue under the rug >_>
Actually, I think I've just solved it. I went to have a look at the Firefox source to see if it makes any reference to speech dispatcher and found this, from the comment:
Speech dispatcher does not pause immediately, but waits for the speech to reach an index mark so that it could resume from that offset. There is no support for word or sentence boundaries, so index marks would only occur in explicit SSML marks, and we don't support that yet. What in actuality happens, is that if you call spd_pause(), it will speak the utterance in its entirety, dispatch an end event, and then put speechd in a 'paused' state. Since it is after the utterance ended, we don't get that state change, and our speech api is in an unrecoverable state. So, since it is useless anyway, I am not implementing pause.
So the take away is not that speech dispatcher can't pause, it's that it will only ever pause on SSML <mark>
tags [1], which don't exist in the text.
My guess would be that behind the scenes, Firefox is chunking up the text into sections and keeping an internal queue, when a section gets "paused", it's actually getting cancelled and then replayed later when you "resume".
It looks like SSML works in speech dispatcher, and is planned but unfortunately not yet implemented in Piper, so while ZoTTS could inject <mark>
tags, I don't know if they would be understood yet...
I have a few possible ideas for how to fix this (and luckily, am planning to dual boot my computer with Linux soon so I'll be able to test them out myself), so if you're happy leaving this with me, I can work on a solution and get back to you.
Sorry I don't have an immediate fix...!
[1] SSML is "Speech Synthesis Markup Language", it allows you to fine tune your control of the output, <mark>
tags are empty tags purely used for behind the scenes stuff like triggering events, they don't affect output.
can I technically not use a synthesizer as well to see if that fixes my issue?
Sure, so long as it's compatible with speech dispatcher (as it looks like most engines are), it should (should) work with ZoTTS!
I'll continue working on the Piper thing in any case.
Oh awesome! Glad you've figured it out :-) it will be sooo good to finally use it. I'd been using TTS for listening to articles and ebooks for years, but left that behind when I adopted Zotero, so being able to finally use Zotero and TTS together will be a learning revolution (^o^)/
Sure, so long as it's compatible with speech dispatcher (as it looks like most engines are), it should (should) work with ZoTTS!
How would I go about doing that?
am planning to dual boot my computer with Linux soon
Oh your running windows? clicks on profile picture I'm shocked such a flawless manicure can be cultivated outside of Linux!
:rocket: This ticket has been resolved in v1.4.0. See Release v1.4.0 for release notes.
Hi @v4u6h4n, apologies for the slight delay.
You should be able to pause speech in Linux now!
(Just fyi, it'll only pause at the end of sentences, not immediately.)
The solution is slightly hacked together since I ended up having to manually split the sentences in a block of text and then speak them one by one. I did a fair amount of testing, so there should be no glaring issues, but do let me know if you run into any problems!
ImperialSquid! You are amazing! Thankyou :-)
Ah I see what you mean, the pause being delayed until the end of the sentence isn't ideal, but for me I'm happy with that, so appreciate the hackiness.
Oddly enough I've found that when I use playerctl to play/pause the playback (playerctl --player playerctld play
), it actually pauses and resumes to the most recent word played.
Happy to help 😁
Oddly enough I've found that when I use playerctl to play/pause the playback (playerctl --player playerctld play), it actually pauses and resumes to the most recent word played.
Huh, very interesting... Do you mind sharing further details about your setup?
Yeah of course, just not sure what details to provide besides: arch linux and still using piper; I think playerctl just comes with arch, but maybe I installed it at a previous date and have forgotten, which is common, it just uses the MPRIS d-bus thingy to interface with playback changes as far as I know, and the playerctld
is the daemon component that I use for remembering the previous player that was used when resuming playback.
Sure sorry, some more detailed questions then:
As mentioned previously, I'm not a Linux native lol, so just trying to gather as much info as possible on what's best practice/what works for people/what doesn't/etc. That way I can make my code better, or write up some docs for other people in case they get lost in the future, that kind of thing. Windows and Mac do a pretty thorough job abstracting these considerations away from people so normally it's a case of just pointing at existing docs for them, whereas Linux is incredibly varied by comparison lol!
I'm just running the command via command line, well technically through a bash script, but results are the same. It just sends commands to MPRIS, which I think is doing the real work, and I'm guessing it's interfacing with zotts via firefox because its a supported client?
I don't have any custom configs for playerctl, or at least I don't think so, sorry I have a disociative disorder so I sometimes write code and forget I did it. I may have had to create the playerctld config file (outlined in the link to MPRIS), buuut I just tested using playerctl pause
and playerctl play
and it works the same, so I think you can ignore playerctld
commands, as I think it's just for switching between multiple players, and just uses playerctl for the actual MPRIS playback commands anyway.
I'm not a Linux native lol
Never too late to start ;-) lol
Just realised skipping forward/backward in the text also works via playerctl!
Just realised skipping forward/backward in the text also works via playerctl!
Huh, cool!
Unfortunately, I don't think it's super useful for ZoTTS. ZoTTS uses the Web Speech API in Firefox which doesn't have anything built-in for skipping forwards/backwards (only play
/pause
/resume
/cancel
)...
So if I wanted to use this I'd have to interact with the command line directly from Zotero, which is possible, but would also mean I need to reconstruct all the functionality WSA has already...
It might be something to revisit in the future when I do get around to implementing skipping, but for now I don't think it's super useful...
Thanks for letting me know though! 😊
Checklist
Zotero version
7.0.7
ZoTTS version
1.3.0
OS
Linux
OS (specific)
arch hyprland
Steps to reproduce
Expected behaviour
Read me a bedtime story :-)
Actual behaviour
No bedtime story was read :-(