CodingTrain / Bizarro-Devin

12 stars 4 forks source link

Experimenting with TTS #30

Closed shiffman closed 5 months ago

shiffman commented 5 months ago

Here are two TTS options.

  1. transformers.js - https://huggingface.co/Xenova/mms-tts-eng

  2. coqui-tts - https://github.com/coqui-ai/TTS.git

I followed these instructions: https://blog.graywind.org/posts/coqui-tts-mac/

And am running the server:

tts-server --model_name tts_models/en/ljspeech/vits

I'm having some trouble getting the transformers.js output to play, maybe a different sampling rate?

This is the node package I'm running: https://www.npmjs.com/package/play-sound (cc @Xenova)

An transformers.js would be great, but the TTS server runs super fast and I believe it will allow me to customize the voice / train my own voice model?

Feedback and suggestions welcome!

shiffman commented 5 months ago

There are also models on replicate but I don't think any of them are suitable for real-time?

https://replicate.com/collections/text-to-speech

dipamsen commented 5 months ago

I got a valid wav file from the transformers.js output by following the docs (using wavefile library):

const wavefile = require('wavefile');

const synthesizer = await pipeline('text-to-speech', 'Xenova/mms-tts-eng', {
  quantized: false,
});

const output = await synthesizer(txt);

const wav = new wavefile.WaveFile();
wav.fromScratch(1, output.sampling_rate, '32f', output.audio);

const tempFilePath = 'temp_audio.wav';
await fs.writeFile(tempFilePath, wav.toBuffer());

However, play-sound is unable to play it properly.

dipamsen commented 5 months ago

I also got it working with coqui-tts (tts-server)! Just can't play the audio dynamically, maybe I should try with ffplay

shiffman commented 5 months ago

Merging this but leaving it using the say package right now as it doesn't require a separate server to run.