erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
1.05k stars 113 forks source link

Stream request not working on ios devices or on Safari #284

Closed toanbot closed 3 months ago

toanbot commented 3 months ago

I have tested various iOS devices and found that on any browsers the stream request cannot play audio, although it plays normally on Android devices and PC browsers ( except for Safari. )

erew123 commented 3 months ago

Hi @toanbot

Best I can tell, apple have dropped support for certain types of wav files over recent iOS revisions and within Safari. I suspect its obviously related to that.

AllTalk, when streaming, relies on Coqui's scripts, which specifically generate a wav file from the AI model. The streaming wav doesn't contain a header like a standard wav file would, because when streaming starts, the file hasn't fully been generated, hence it cannot generate a header for the file saying how long it will be etc. I have tried a few methods in the past to try generate a false header, however Ive had no success in doing so. There is no way to transcode the file as a streaming generation as its an incomplete file with no header.

Non-streaming generation fully generates the wav file before presenting the file, so is less likely to be an issue.

I suspect that Apple may lock down 3rd parties from supporting other methods/file types to a relative degree (aka non fully rendered wav files etc), this however is a guess based on the fact I know Apple can be quite restrictive within the development ecosystem they have.