erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
864 stars 98 forks source link

Can't disable setting "Pass Asterisks to TTS Engine" #270

Closed uberubert closed 2 months ago

uberubert commented 2 months ago

Problem: Every time I load the interface, it re-enables the setting for "Pass Asterisks to TTS Engine".

bilde

Expected behaviour: I want the setting to stay disabled after I disabled it, even after reloading the SillyTavern webapp.

What I tried:

I snooped around in the code, found that it clicks the checkbox and override its setting based on some other setting. This happens during website init/loading. I found this happens on two separate places in the code:

By removing these if statements completely, the setting seems to stay disabled, even after reloading the SillyTavern webapp. I can't really tell if this will have any bad effects, but my desire to avoid having asterisk text spoken by the tts is now fulfilled, if only temporarily.

bilde

erew123 commented 2 months ago

Hi @uberubert

Please see https://github.com/erew123/alltalk_tts?tab=readme-ov-file#-a-note-on-character-cards--greeting-messages to explain why asterisks are passed over to AllTalk. As such, enabling the Narrator will be what switches on/off the "Pass Asterisks" and they would not be pronounced as the word asterisk, but used to delineate Character or Narrator spoken text.

Hope that helps clarify.

Thanks

erew123 commented 2 months ago

Just FYI, you can test this behaviour out with the provided CURL commands in the API section (replacing the settings with those that match your chosen voices and the current TTS engine).

https://github.com/erew123/alltalk_tts?tab=readme-ov-file#-example-command-lines-standard-generation

erew123 commented 2 months ago

Finally on v2, if you had a specific type of text you do/dont want to pass through to the underlying TTS engine (post processing the initial text for character/narrator etc). you can do that within the AllTalk interface > Global Settings > AllTalk API Defaults and scroll down to API Allowed Text Filtering/Passthrough Settings

image

Thanks

erew123 commented 2 months ago

Filtering is applied at the point where the chart turns red, as the TTS is generated and before being processed by RVC etc

AllTalk API Process

uberubert commented 2 months ago

I think this issue is more related to UX than how the plugin works technically.

My narrator setting was disabled, I expected only the quoted "character speaking" text to be voiced. To my surprise, the entire thing was voiced even when it was in asterisks. It was voiced because the option was re-enabled automatically upon my reloading the webapp.

So it seems I have to keep the narrator enabled in order to avoid asterisk-text to be spoken. But then there's no way to reduce spoken voice to just the quoted "character speaking" text, unless I manually go over the settings every time I load the app.

To me this makes little sense, as it only serves to override my ability to manipulate options given to me. If this option absolutely must be set to a specific value, then would it not make more sense to remove this option? As a user, I would expect this option to stay the way I put it, or not be there at all.

Also, should I really switch to V2? I always prefer stable over beta, which is why I stick with V1 for now. (Oh, and I don't mean to come across as critical here, I'm loving the generated voice lines!)

erew123 commented 2 months ago

Hi @uberubert

In v2 there is "Enabled Silent" options for narrator and text not inside. Its detailed in the Gradio interface documentation, however, here is a partial snippet:

image

With SillyTavern, it does require the updated extension be updated to the v2 extension (instructions here) https://github.com/erew123/alltalk_tts/tree/alltalkbeta/system/SillyTavern%20Extension/For%20AllTalk%20V2

As for v2, the core codebase is pretty solid and Ive had little issue with that. There is mainly 40-60 hours of documentation improvement/writing to do, finish google colab, create a docker build for people who use docker and I may yet add other features listed here https://github.com/erew123/alltalk_tts/discussions/74

Its purely your choice.

Thanks