Closed C0rn3j closed 3 months ago
Hi @C0rn3j
I see you have made a couple of posts, but not sure I will get to them all today.
Re ogg etc, have you checked out the transcoding on AllTalk v2? https://github.com/erew123/alltalk_tts/tree/alltalkbeta
I am assuming you are talking about v1 of AllTalk?
I have not checked out v2 at all, though it seems like v1 is able to generate ogg directly - at least my own transcodes with ffmpeg resulted in hilariously bigger filesizes than when telling alltalk to just save in ogg.
Hi @C0rn3j The actual XTTS model output tensors are in a raw wav format, so although you can name the file something else, you are still getting a wav file, so a second step to transcode is required. V2 addresses pretty much all the issues here, along with many others and is a decent jump over v1. I'd advise checking out V2
So the V1 main / page just does transcode on the wav before saving it and the gen page+API lacks the feature?
Saving as .wav vs .ogg:
EDIT: Yes -> https://github.com/erew123/alltalk_tts/blob/510bc2d1a3aa008d776172486666be5c4a38bcc9/tts_server.py#L566
I wish to get ogg as outputs, the Web UI allows to use filename as something.ogg and it generates fine (but it is not obvious or documented that you can do this), but the API only allows to set extensionless filename.
Benefits are huge - the difference is a 1MB wav vs 90KB ogg, for example.