152334H / tortoise-tts-fast

Fast TorToiSe inference (5x or your money back!)
GNU Affero General Public License v3.0
779 stars 179 forks source link

Is it possible to regenerating bad clips? #76

Open Angrod opened 1 year ago

Angrod commented 1 year ago

Around a minute and a half and every so often after that a clip will repeat the last thing said or make like a groaning noise? Everything else sounds perfect but is there a way to regenerate those specific clips?

CodexOmega commented 1 year ago

read.py --regenerate

Angrod commented 1 year ago

read.py --regenerate

no read.py in this fork

CodexOmega commented 1 year ago

read.py --regenerate

no read.py in this fork

Yep, Sorry about the confusion. I guess there is no way of reaerating theme.. ☕

eloop001 commented 1 year ago

Hi Matt You can avoid this by adding pipe | sign in the text. If the application sees you have these signs it will not insert them from the tokenizer.py script. The way they are created is pretty naïve, and there is room for improvement, I guess it has not been the area of a lot of attention. However, you can write your own function to do this, and you should be golden. In the file you can also add your own abbreviations relevant for your text. A way to do this, is to check the tutorials on huggingface on "how to make a wav-to-vec tokeniser". You are not actually going to make a tokenizer, but the code will as an intermediate step create statistics of the occurrence of all words. If you write the less frequent words to a text file, you may find abbreviations. Note, you would want to include "." as a part of the character set, but only ".", not ". " (punctuation followed by a space.), I recommend : https://huggingface.co/blog/fine-tune-wav2vec2-english --around in the middle of the article.

It happens because, if the segment is too long, the AI can start to make hallucinations. If the segments are too short, you risk that context is lost.

regs,

eloop

Den tirs. 25. apr. 2023 kl. 06.51 skrev Matt @.***>:

Around a minute and a half and every so often after that a clip will repeat the last thing said or make like a groaning noise? Everything else sounds perfect but is there a way to regenerate those specific clips?

— Reply to this email directly, view it on GitHub https://github.com/152334H/tortoise-tts-fast/issues/76, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF4T6OHGJCICN6W344AEFTLXC5J3VANCNFSM6AAAAAAXKORPYA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Mads Voigt Hingelberg CEO & PARTNER

M: +45 4141 6181 <callto:+4541416181> W: www.innovationlab.dk http://www.innovationlab.dk/

Mariane Thomsens Gade 6.st, 8000 Aarhus C, Denmark

AARHUS // COPENHAGEN // BERGEN // SAN FRANCISCO // DUBAI // CAMBRIDGE

Angrod commented 1 year ago

Although it's not as intuitive right off the bat, I think this will be extremely helpful! Thank you for the quick breakdown and tutorial links, I will be able to read through a little later today.

When using a pipe, does it matter where it falls in the text? I'm assuming it is better to pipe immediately after a punctuation without a space.

eloop001 commented 1 year ago

It's certainly better to do it after a punctuation mark. I ran into the issue that sentences between punctuation marks would still need to be shorter. You can look in the tokenizer.py file to check the recommended length. You should prioritize, not split when you are within an apostrophe, e.g. "I cannot do it", said Henry. To make it sound right, you should really prioritize splitting at the punctuation mark in the example. Consequently, the "rule book" for splitting may be quite long...

I'ts not an exact science, and it may even depend heavily on the text you want to do TTS on, so you could, e.g. make special logic tuned for the content.

On the topic of it not being an exact science, perhaps it could make sense to train an AI to make qualified splits where it makes the most sense in the text....hmmm...meditate on this, I will... :-)

I'll be happy to work on a revamped version of this functionality if you'd like.

regs,

eloop

Den man. 1. maj 2023 kl. 20.40 skrev Matt @.***>:

Although it's not as intuitive right off the bat, I think this will be extremely helpful! Thank you for the quick breakdown and tutorial links, I will be able to read through a little later today.

When using a pipe, does it matter where it falls in the text? I'm assuming it is better to pipe immediately after a punctuation without a space.

— Reply to this email directly, view it on GitHub https://github.com/152334H/tortoise-tts-fast/issues/76#issuecomment-1530058789, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF4T6OEYY62W4VPCIBLDDJTXD77R3ANCNFSM6AAAAAAXKORPYA . You are receiving this because you commented.Message ID: @.***>

--

Mads Voigt Hingelberg CEO & PARTNER

M: +45 4141 6181 <callto:+4541416181> W: www.innovationlab.dk http://www.innovationlab.dk/

Mariane Thomsens Gade 6.st, 8000 Aarhus C, Denmark

AARHUS // COPENHAGEN // BERGEN // SAN FRANCISCO // DUBAI // CAMBRIDGE

eloop001 commented 1 year ago

Just an example. It would be trivial to split on punctuation marks, because of Kafka's writing style, but other texts are not that semantically graceful.

https://www.gutenberg.org/cache/epub/7849/pg7849.txt

Den man. 1. maj 2023 kl. 20.40 skrev Matt @.***>:

Although it's not as intuitive right off the bat, I think this will be extremely helpful! Thank you for the quick breakdown and tutorial links, I will be able to read through a little later today.

When using a pipe, does it matter where it falls in the text? I'm assuming it is better to pipe immediately after a punctuation without a space.

— Reply to this email directly, view it on GitHub https://github.com/152334H/tortoise-tts-fast/issues/76#issuecomment-1530058789, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF4T6OEYY62W4VPCIBLDDJTXD77R3ANCNFSM6AAAAAAXKORPYA . You are receiving this because you commented.Message ID: @.***>

--

Mads Voigt Hingelberg CEO & PARTNER

M: +45 4141 6181 <callto:+4541416181> W: www.innovationlab.dk http://www.innovationlab.dk/

Mariane Thomsens Gade 6.st, 8000 Aarhus C, Denmark

AARHUS // COPENHAGEN // BERGEN // SAN FRANCISCO // DUBAI // CAMBRIDGE

Angrod commented 1 year ago

Definitely something I'll be testing out and tweaking as I go. I also need to refine where the correct length of text is at before behaviors start acting up. Not confident in my AI training abilities but soon!