Mozer / talk-llama-fast

Port of OpenAI's Whisper model in C/C++ with xtts and wav2lip
MIT License
749 stars 69 forks source link

llama text generation stops too early or other problems? #25

Open PasiKoodaa opened 5 months ago

PasiKoodaa commented 5 months ago

It seems that the llama text generation stops too early? Managed to get long answer, audio and video only one time.

from talk-llama-wav2lip-bat: run : initializing - please wait ...

Llama start prompt: 536/3548 tokens in 1.312 s at 408 t/s Llama stop words: 'Alex:', 'Alex :', 'Aleks:', 'alex:', '---', 'ALex', Voice commands: Stop(Ctrl+Space), Regenerate(Ctrl+Right), Delete(Ctrl+Delete), Reset(Ctrl+R) Start speaking or typing:

Alex: hello Anna: helloo [Speech/Stop!]

Alex: Tell me a joke Anna: Why have [Speech/Stop!]

[t: 560]

Alex: Tell me a joke Anna: Why did [Speech/Stop!]

from xtts_wav2lip.bat: xtts_api_server.server:tts_to_audio:337 - Processing TTS to audio with request: text='Why did' speaker_wav='Anna' language='en' reply_part=0 speech detected, xtts won't generate INFO: ::1:56219 - "POST /tts_to_audio/ HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\uvicorn\protocols\http\httptools_impl.py", line 411, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\uvicorn\middleware\proxy_headers.py", line 69, in call return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\fastapi\applications.py", line 1054, in call await super().call(scope, receive, send) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 186, in call raise exc File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 164, in call await self.app(scope, receive, _send) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\cors.py", line 85, in call await self.app(scope, receive, send) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app raise exc File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 776, in app await route.handle(scope, receive, send) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 297, in handle await self.app(scope, receive, send) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app raise exc File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 72, in app response = await func(request) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\fastapi\routing.py", line 278, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\fastapi\routing.py", line 191, in run_endpoint_function return await dependant.call(values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\server.py", line 347, in tts_to_audio output_file_path = XTTS.process_tts_to_file( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 609, in process_tts_to_file raise e # Propagate exceptions for endpoint handling. ^^^^^^^ File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 598, in process_tts_to_file self.local_generation(clear_text,speaker_name_or_path,speaker_wav,language,output_file) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 495, in local_generation out = self.model.inference( ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\TTS\tts\models\xtts.py", line 608, in inference "wav": torch.cat(wavs, dim=0).numpy(), ^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: torch.cat(): expected a non-empty list of Tensors 1715128015.1753356 in server request 2024-05-08 03:26:55.176 | INFO | xtts_api_server.server:tts_to_audio:337 - Processing TTS to audio with request: text='Fuck youuu' speaker_wav='Anna' language='en' reply_part=0 speech detected, xtts won't generate INFO: ::1:56232 - "POST /tts_to_audio/ HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\uvicorn\protocols\http\httptools_impl.py", line 411, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\uvicorn\middleware\proxy_headers.py", line 69, in call return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\fastapi\applications.py", line 1054, in call await super().call(scope, receive, send) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 186, in call raise exc File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 164, in call await self.app(scope, receive, _send) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\cors.py", line 85, in call await self.app(scope, receive, send) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app raise exc File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 776, in app await route.handle(scope, receive, send) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 297, in handle await self.app(scope, receive, send) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app raise exc File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 72, in app response = await func(request) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\fastapi\routing.py", line 278, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\fastapi\routing.py", line 191, in run_endpoint_function return await dependant.call(values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\server.py", line 347, in tts_to_audio output_file_path = XTTS.process_tts_to_file( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 609, in process_tts_to_file raise e # Propagate exceptions for endpoint handling. ^^^^^^^ File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 598, in process_tts_to_file self.local_generation(clear_text,speaker_name_or_path,speaker_wav,language,output_file) File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 495, in local_generation out = self.model.inference( ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\John\miniconda3\envs\xtts\Lib\site-packages\TTS\tts\models\xtts.py", line 608, in inference "wav": torch.cat(wavs, dim=0).numpy(), ^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: torch.cat(): expected a non-empty list of Tensors

PasiKoodaa commented 5 months ago

It starts working, when I unplug microphone and just write the text. So it might be the way microphone interrupts it? But it doesn't even catch speech so easily.

PasiKoodaa commented 5 months ago

SOLUTION: I fixed it by changing the --vad-start-thold 0.000270 to --vad-start-thold 0.000470

Mozer commented 4 months ago

Just today I added --push-to-talk option (hold Alt to speak). Try it, it can help if the microphone is too sensitive and stops generation.