Whisper Transcription not working

Vander1nde21 commented 3 months ago

So I've done all the steps. but when I ran talk-llama-wav2lip.bat it said that cudart64_110.dll, cublasLt64_11.dll and cublas64_11.dll was missing. so I went online, downloaded those DLLs and dropped them in the main directory and it worked. No errors after that. I then tweaked the parameters to my liking alongside the audio input device. However, I'm speaking and nothing is transcribing. I feel like this may be an issue with Whisper.

Mozer commented 3 months ago

check if the mic is working in some other program, e.g. Audacity. And it should be set as default in windows. If there are no visible errors, I guess it's the problem with the mic, try another mic.
One more guess - if there non latin symbols in the mic name, it can cause problems. Try to copy command from talk-llama-wav2lip.bat, and paste into cmd opened in directory where the bats are. Run. It can solve problems with non-latin symbols.

ps. about cuda - did you install CUDA Toolkit or just copied dlls? I think we need to add some info about cuda package to readme.

Vander1nde21 commented 3 months ago

I used audacity and other software and the mic is working fine. my mic is simply named (USB AUDIO DEVICE) and has no non latin symbols. also yeah I installed CUDA toolkit setup installer from NVIDIA just to see if anything happens, but the dll errors still persisted. But when i downloaded each the cudart64_110.dll, cublasLt64_11.dll and cublas64_11.dll files from .dll-files.com it got rid of the errors

Screenshot 2024-04-07 201751

Mozer commented 3 months ago

Try to enable "Listen to this device" in mic settings. It might help or can find a problem. After that - restart talk-llama

Vander1nde21 commented 3 months ago

Just enabled it. I can hear my own voice but nothing is transcribing still. Screenshot 2024-04-07 211433

Mozer commented 3 months ago

Try to run from cmd instead of .bat. Copy and paste command. cmd should be run from the same dir. It helped one guy with airpods mic not working. Keep that "Listen to this device" enabled.

Vander1nde21 commented 3 months ago

Yeah just tried that. Ran CMD from the directory where all the bat files are alongside listen to this device being enabled at the same time and still no luck.

Mozer commented 3 months ago

What version of windows are you using? Can you you try with some other mic? If you don't have one you can use "wo mic" android app to pass mic sound from your phone to Windows.

Vander1nde21 commented 3 months ago

I'm on windows 11. Ok it seems to detect my voice now strangely. But It would be great if i can use my own mic since the audio is so much better than simply having to keep using a virtual mic on my phone. I'll probs have to stick to this for the meantime. But i'm now encountering an error. wav2lip wont generate a video since XTTS server just stopped after i spoke and it just keeps showing the error after i speak.

Here's what shows up on the XTTS server:

Stream stopped successfully! speech detected, xtts won't generate INFO: ::1:55812 - "POST /tts_to_audio/ HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\uvicorn\protocols\http\h11_impl.py", line 407, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\uvicorn\middleware\proxy_headers.py", line 69, in call return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\fastapi\applications.py", line 1054, in call await super().call(scope, receive, send) File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\starlette\applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 186, in call raise exc File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 164, in call await self.app(scope, receive, _send) File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\cors.py", line 85, in call await self.app(scope, receive, send) File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\starlette\middleware\exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app raise exc File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 776, in app await route.handle(scope, receive, send) File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 297, in handle await self.app(scope, receive, send) File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app raise exc File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\starlette\routing.py", line 72, in app response = await func(request) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\fastapi\routing.py", line 278, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\fastapi\routing.py", line 191, in run_endpoint_function return await dependant.call(*values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\server.py", line 346, in tts_to_audio output_file_path = XTTS.process_tts_to_file( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 609, in process_tts_to_file raise e # Propagate exceptions for endpoint handling. ^^^^^^^ File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 598, in process_tts_to_file self.local_generation(clear_text,speaker_name_or_path,speaker_wav,language,output_file) File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 495, in local_generation out = self.model.inference( ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Miniconda3\envs\xtts\Lib\site-packages\TTS\tts\models\xtts.py", line 609, in inference "wav": torch.cat(wavs, dim=0).numpy(), ^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: torch.cat(): expected a non-empty list of Tensors

Mozer commented 3 months ago

Try to restart everything. This error is non critical, it just says that xtts won't generate audio, because it detected noise (your speech). But if there's no noise - and xtts still says there is - it's a problem. What version of In talk-llama-fast are you using?

In talk-llama-fast-v0.1.3 exe i updated the location of xtts control file xtts_play_allowed.txt (now its in temp dir). Maybe you are using a new exe and old xtts. Old xtts can't find new xtts_play_allowed.txt. simple solution use old 0.1.2 exe with my old xtts.

And you can turn off 'stop in speech' feature completely using --vad-start-thold 0 in talk-llama-wav2lip.bat

Vander1nde21 commented 3 months ago

Ok, updated miniconda and completely done a clean install and it's working better than it did before, and that xtts error is gone now. however it's wav2lip that has an error this time.

this is what it says in the silly_extras.bat wav2lip server for each time a text is generated:

in wav2lip_server_generate: is busy: 0, face_detect_running: 0, chunk: 42, chunk_needed: 42, reply: 3 Error: tts_out/out_42.wav file is not found

NOTE: once the silly_extras.bat file starts, i get a few errors that mention ffmpeg. maybe this could be the problem.

Mozer commented 3 months ago

Error: tts_out/out_42.wav file is not found - means that /tts_out/ dir is wrong in xtts_wav2lip.bat. check path and it must end with 2 slashes. But as i see in your screenshot it can find your tts wavs (but not sure)

Did you copy openh264-1.8.0-win64.dll to /system32 or /ffmpeg/bin dir? If didn't - copy it, if you did - delete it, maybe it's causing error with videowriter.

Vander1nde21 commented 3 months ago

ah yes those first two ffmpeg and openh264 errors have dissapeared now, but in the talk-llama-wav2lip.bat I set --vad-start-thold 0 and i still get this error:

in wav2lip_server_generate: is busy: 0, face_detect_running: 0, chunk: 1, chunk_needed: 1, reply: 0 Error: tts_out/out_1.wav file is not found 127.0.0.1 - - [09/Apr/2024 08:55:17] "GET /api/wav2lip/generate/Default/cuda/out_1/latest/1/0 HTTP/1.1" 200 - 1712616918.2462685 in wav2lip gen server chunk:2_1

Here is how my /tts_out/ dir is set in xtts_wav2lip.bat:

--output B:\TalkLlamaFastDirectory\xtts\SillyTavern-Extras\tts_out\

also i changed it to two slashes and the error remains

Vander1nde21 commented 3 months ago

Nevermind the path was incorrect. changed it and it's all fixed now. Thanks so much @Mozer

Mozer commented 3 months ago

Missing dlls were because of wrong CUDA Toolkit version. You need to have CUDA Toolkit 11.x version. 12.x version will be causing missing dlls errors. I have updated the manual. https://developer.nvidia.com/cuda-11-8-0-download-archive

Mozer / talk-llama-fast

Whisper Transcription not working #5