LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with KoboldAI's UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.34k stars 310 forks source link

bark.cpp as TTS #912

Closed tororon1231 closed 2 weeks ago

tororon1231 commented 3 weeks ago

Integrate bark.cpp as TTS from this repository into koboldcpp. It would be nice to have it running natively.

LostRuins commented 3 weeks ago

The quality of speech produced by bark is very poor, and is also very slow. Have you tried XTTS or the built-in browser TTS?

tororon1231 commented 3 weeks ago

The built-in browser TTS and XTTS do not have the ability to produce non-speech sounds and expressions (such as laughing, crying, etc.), but bark does. Even the speech quality of the built-in browser TTS can be worse than bark. XTTS is good, but I have to set up an additional server, whereas it would be better and easier to have the TTS functionality running natively in koboldcpp. From a practical point of view, bark has its limitations, but bark.cpp will serve as a basis for adding support for more and better TTS models under the C++ umbrella in the future, as its developer wishes (1). Besides, bark is not that bad (the words it produces in a sentence are distinguishable). But I understand if you think it is not worth implementing in koboldcpp.

tororon1231 commented 2 weeks ago

@LostRuins I have just produced two sentences with bark. It is actually quite decent. Have a listen. Transcription: I couldn’t believe my eyes when I saw the puppy run towards me, its tail wagging as if to say, ‘I’ve missed you!’ [laughs] And then, out of nowhere, the lights flickered off, leaving us in an eerie silence that seemed to whisper secrets from the shadows.

https://github.com/LostRuins/koboldcpp/assets/172286123/86d464c9-2b7f-4909-8ec4-293d447b20e5

tororon1231 commented 2 weeks ago

@LostRuins So what do you think? Does bark.cpp or something similar for TTS have potential to be included in koboldcpp at some point in the future?

tororon1231 commented 2 weeks ago

Here is an earlier discussion of this idea by someone else in your repository, so I am closing this issue.

LostRuins commented 2 weeks ago

Yeah, if you compare that sample with the XTTS voice that I used inside the latest 1.67 release video, they are worlds apart.

tororon1231 commented 2 weeks ago

Well, yes, they are different. XTTS pronounces every word in the video perfectly, making it the best for storytelling, while Bark has small imperfections in its voice, which might be more suitable for conversation, I think. However, both are better than the browser's built-in TTS. In any case thanks for your consideration and for this great project!