Unable to match newline in advanced configuration tab

Rapti commented 3 years ago

I'm trying add a substitution rule for newlines in ATTS' advanced configuration tab, but I'm unable to do so. I've tried:

\n
\r
\r\n
A literal line break
\u0085
 
… with convert any newline(s) in input into an ellipsis checked
[\n\r]+

All both as regex and as plain text except the last one obviously. Using $ is also not an option because the interface doesn't let me change any modifiers other than i. Granting control over modifiers would also be great, but that's another issue.

Am I missing something? If not, please make this possible. I really like ATTS otherwise. Thanks!

luc-vocab commented 3 years ago

Please tell me what you have in your note (please show the html as well), and what you expect on the output

Rapti commented 3 years ago

They look like this: line 1 line 2

I dug a little deeper and I think the problem is that ATTS, according to the Other Notes section of this docs page, removes any HTML very early on and before processing the user's advanced substitution rules. The result is that even the convert any newline(s) in input into an ellipsis setting doesn't see any traces of those line breaks.

I think the problem can be solved with any of the following changes:

Process user substitutions before removing HTML tags
Convert any   tags into actual newlines before stripping off the rest of the HTML
Include handling of   tags in the convert any newline(s) in input into an ellipsis setting and make sure they don't get stripped away before that

luc-vocab commented 3 years ago

OK thanks for the details. First thing i'll try to do is reproduce it in a unit test. I wrote such unit tests some while back to test some things around text processing. Let me get back to you a bit later.

luc-vocab commented 3 years ago

One more thing: what's the ultimate audio output that you're looking for ?

Rapti commented 3 years ago

I'm trying to achieve a pause.

Consider this German to French vocabulary card for instance:

Rakete [3x]
---------------
une fusée
une roquette
un missile

Right now, it is read like the translations were all in one line. I'd like ATTS to insert a comma, an ellipsis, or something similar to prevent this. As I mentioned above, the existing convert any newline(s) in input into an ellipsis doesn't work because Anki stores line breaks as HTML.

luc-vocab commented 3 years ago

hi @Rapti , the following worked for me: (the three words are pronounced with a pause between them).

In AwesomeTTS configuration, Text tab, Handling text from a note field, check "convert newlines in an input to ellipsis"
Then in Advanced tab on the left, add a replacement rule to replace ...with <break time="2s"/> This requires a service with SSML like Azure, Google, Amazon.

2021-09-28 21_32_56-AwesomeTTS_ Configuration_1

2021-09-28 21_33_22-AwesomeTTS_ Configuration_2

Rapti commented 3 years ago

Hi, I've tried this in both the upper and lower sections, but it does nothing. Maybe your Anki somehow stores line breaks as actual line breaks instead of as   tags.

luc-vocab commented 3 years ago

@Rapti send me your deck, From the Anki main screen, click File → Export, click Export... . Then locate apkg file you just created, and upload it to https://www.dropbox.com/request/X9L4mhPqEcfvKFuSm2nK

telotortium commented 2 years ago

I ran into this issue while trying to match on newline.

The culprit is the HTML sanitization rule run on the field text. This is run before the custom substitution rules, and strips all HTML, including converting   tags to a single space.

I was trying to extract only the Chinese text in fields that look like this:

你好<br><br>Hello

I was able to work around this for my purposes (to send only the first line of my field, which contains my source text) by checking the option to convert newlines to ellipsis, and then using the following substitution in the Advanced tab (making sure to check the "regex" option: pattern ^(.*?)[ ][.][.][.][ ].*, replace with \1.

It would be nice to at least have the option of running custom substitution rules on the HTML, as opposed to the sanitized text.

luc-vocab commented 2 years ago

@telotortium please contact me by email awesometts@airpost.net , i'm looking for technical users to test-drive AwesomeTTS2 (codenamed HyperTTS), which is supposed to completely revamp text processing.

luc-vocab commented 2 years ago

In case anyone needs advanced text processing logic, please try the new addon, HyperTTS: https://ankiweb.net/shared/info/111623432 it has much more powerful and transparent capabilities for processing text prior to TTS generation.

telotortium commented 2 years ago

In case anyone needs advanced text processing logic, please try the new addon, HyperTTS: https://ankiweb.net/shared/info/111623432 it has much more powerful and transparent capabilities for processing text prior to TTS generation.

Thanks, I was able to directly do what I want (match on   tags) using HyperTTS.

luc-vocab commented 2 years ago

@telotortium great, if HyperTTS works for you, please leave a review on the addon page.

9ycbgf0k8fpg commented 2 years ago

Hello

I just spent a whole day debugging this issue, first thinking it was an issue with the addon itself, because when you highlight text and use "right click>Say" the newlines are interpreted correctly. After wasting a few hours with debugging the addon, I understood that the issue is with Anki itself. When you use the TTS tag, it strips all HTML and newlines before feeding it to the player. (I think it is a pretty stupid idea to make it mandatory but whatever) I then tried to build Anki from source to disable this behavior, but I wasted another hour and didn't succeed. I then saw the new TTS tag format where you can put the text you want to read between the tags, and I tried to inject the card's content with Javascript between the tags, but I turns out javascript is interpreted only after the TTS are, so it was useless.

I finally found a working solution ! Here it is : [anki:tts lang=en_EN t="{{Extra}}"][/anki:tts]

If you feed the content you want to TTS in another parameter (here I used "t"), anki doesn't strip it. It is the way I found to "exfiltrate" the raw card data to the addon.

Then I can change ttsplayer.py and use this code instead of text = tag.field_text : text = tag.other_args[0][2:].replace(" ", "...")

I hope my suffering is useful to someone else :-)

luc-vocab commented 2 years ago

@9ycbgf0k8fpg have you tried HyperTTS ? it has much better text processing capabilities, and you can observe your changes in realtime to ensure the text replacement is working as expected.

AwesomeTTS / awesometts-anki-addon

Unable to match newline in advanced configuration tab #209