Closed shiffman closed 5 months ago
There are also models on replicate but I don't think any of them are suitable for real-time?
I got a valid wav file from the transformers.js output by following the docs (using wavefile library):
const wavefile = require('wavefile');
const synthesizer = await pipeline('text-to-speech', 'Xenova/mms-tts-eng', {
quantized: false,
});
const output = await synthesizer(txt);
const wav = new wavefile.WaveFile();
wav.fromScratch(1, output.sampling_rate, '32f', output.audio);
const tempFilePath = 'temp_audio.wav';
await fs.writeFile(tempFilePath, wav.toBuffer());
However, play-sound
is unable to play it properly.
I also got it working with coqui-tts (tts-server
)! Just can't play the audio dynamically, maybe I should try with ffplay
Merging this but leaving it using the say
package right now as it doesn't require a separate server to run.
Here are two TTS options.
transformers.js - https://huggingface.co/Xenova/mms-tts-eng
coqui-tts - https://github.com/coqui-ai/TTS.git
I followed these instructions: https://blog.graywind.org/posts/coqui-tts-mac/
And am running the server:
I'm having some trouble getting the transformers.js output to play, maybe a different sampling rate?
This is the node package I'm running: https://www.npmjs.com/package/play-sound (cc @Xenova)
An transformers.js would be great, but the TTS server runs super fast and I believe it will allow me to customize the voice / train my own voice model?
Feedback and suggestions welcome!