Closed platform-kit closed 2 months ago
Hey @platform-kit just to be sure. Did you try out the new checkpoints?
@akshhack I'm not sure. I used the demo linked on the readme, yesterday. So if you updated the demo with the new checkpoints, then, yes.
Let me know if you can confirm and always drop some samples; @NourAlMerey it'll be useful for us to maybe to create a report / issue template. Thanks!
@akshhack good idea. Will do that.
@akshhack Here's a sample from the replicate demo
Input:
We provide several generation candidates when you synthesize text, and attempt to pick the best one on the right.
Ref Audio: https://files.catbox.moe/be6df3.wav
Transcript:
We actually haven't managed to meet demand.
Output: https://files.catbox.moe/07ru0x.mp3
API version: https://replicate.com/camb-ai/mars5-tts/versions/097744a80bc07de9293fd35f9997bb86dbbf68a11a1d98c3e1c2295ee5bb89ab
Hi guys, just bumping this as I was able to ship a demo on Replicate that uses the new weights and returns audio in the browser based UI
Here's the sample - as you can see it is still not producing correct prosody (notice the mispronunciation on "quality audio"): https://replicate.delivery/pbxt/7XRjuEf1b2QdSSFeTsbFRblEG8ft4nWV3cdfXuK7GeNfFHegJA/output.mp3
Hey @platform-kit , would you like to make the PR to this file: https://github.com/Camb-ai/MARS5-TTS/blob/master/cog/predict.py Otherwise I will take it up :)
@arnavmehta7 NP I'll submit it in the next couple days.
Tried to generate some outputs using this sentence from the demo's instructions:
We provide several generation candidates when you synthesize text, and attempt to pick the best one on the right.
The word "several" simply WILL NOT come out correctly. It comes out as "seeval," "seeral," "seel," etc.
I am sure this is a byproduct of being an early release, but I want to flag it now as I think that in addition to training there ought to be a way to manually pass in pronunciation data using ssml
Example:
This way, if the autoregressive model repeatedly guesses incorrectly (i.e. on an unusual name), there is a way to force the right result.