Mozer / talk-llama-fast

Port of OpenAI's Whisper model in C/C++ with xtts and wav2lip
MIT License
704 stars 62 forks source link

the voice (and video) cuts out early and doesn't complete the response- would love the SillyTavern instructions. #11

Open tomstur1 opened 2 months ago

tomstur1 commented 2 months ago

XTTS seems to cut out early before response is finished. set chunks to --wav-chunk-sizes=100,200,300,400,9999 no go.

Sillytavern proper with Koboldcpp.exe and another model with extras enabled without the video encoder (talk-llama-wave2lip.bat) -- no problems with the xtts.

would love full SillyTavern instructions to get this video to work -- don't care much for fast-llama .

Mozer commented 2 months ago

You need to run both xtts_wav2lip.bat and my modified silly_extras to make it work with Silly tavern. Also, don't forget params --stream-to-wavs --call-wav2lip in xtts_wavlip.bat, they are also needed. Turn off streaming for koboldcpp and for XTTS in SillyTavern. Streaming is not yet supported in ST for wav2lip.

I have tested --wav-chunk-sizes=100,200,300,400,9999 - no problem found.

tomstur1 commented 2 months ago

with SillyTavern, connected to Extra's (silly_extras.bat) and xtts_wav2lip.bat but only for a single conversation (first one) then it freezes until a new chat session is created. odd. I enabled Xtts2 and connected the xtts_wav2lip service to it. I had to disable streaming in Xtts SillyTavern. But at least I got it to say what I what to say as needed. fun stuff.

{using Koboldcpp )

Mozer commented 2 months ago

Weird. Any errors?