Support fish.audio - Githubissues

thiswillbeyourgithub commented 6 days ago

Hi!

I heard about a very promising model some while ago that you might be interested in. It's called fish.audio.

Here's a youtube demo : https://www.youtube.com/watch?v=Ghc8cJdQyKQ

Here's their github : https://github.com/fishaudio/fish-speech

They seem to be well documented and have already code in place for quantization I think so it might be easier than other models.

Here's their claim:

Zero-shot & Few-shot TTS: Input a 10 to 30-second vocal sample to generate high-quality TTS output. For detailed guidelines, see Voice Cloning Best Practices.

Multilingual & Cross-lingual Support: Simply copy and paste multilingual text into the input box—no need to worry about the language. Currently supports English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish.

No Phoneme Dependency: The model has strong generalization capabilities and does not rely on phonemes for TTS. It can handle text in any language script.

Highly Accurate: Achieves a low CER (Character Error Rate) and WER (Word Error Rate) of around 2% for 5-minute English texts.

Fast: With fish-tech acceleration, the real-time factor is approximately 1:5 on an Nvidia RTX 4060 laptop and 1:15 on an Nvidia RTX 4090.

WebUI Inference: Features an easy-to-use, Gradio-based web UI compatible with Chrome, Firefox, Edge, and other browsers.

GUI Inference: Offers a PyQt6 graphical interface that works seamlessly with the API server. Supports Linux, Windows, and macOS. See GUI.

Deploy-Friendly: Easily set up an inference server with native support for Linux, Windows and MacOS, minimizing speed loss

Sadly I don't know C nor C++ so i'm just suggesting :)

PABannier commented 4 days ago

Hello @thiswillbeyourgithub ! Thanks for the suggestion, I'll have a look into it ;)

thiswillbeyourgithub commented 4 days ago

Hello @thiswillbeyourgithub ! Thanks for the suggestion, I'll have a look into it ;)

Great to hear. Btw I gave it a try the other day and reported how to get started in this issue of opened ai speech :) !

PABannier / bark.cpp

Support fish.audio #195