Mozer / talk-llama-fast

Port of OpenAI's Whisper model in C/C++ with xtts and wav2lip
MIT License
708 stars 64 forks source link

No audio or video. #18

Closed Gnoomer closed 3 months ago

Gnoomer commented 3 months ago

I launched talk-llama-wav2lip-ru.bat, and only text output worked, i tried reloading SillyTavern and xtts and this doesn't seem to help, it says there are no speakers. Help, please. I use Windows 10 on PC with 4070ti and 16gb ram. Here is output of xtts: (xtts) C:\Windows\system32>python -m xtts_api_server --bat-dir %~dp0 -d=cuda --deepspeed --stream-to-wavs --call-wav2lip --output C:\Windows\System32\SillyTavern-Extras\tts_out\ --extras-url http://127.0.0.1:5100/ --wav-chunk-sizes=10,20,40,100,200,300,400,9999 2024-04-15 13:57:48.282 | INFO | xtts_api_server.modeldownloader:upgrade_tts_package:80 - TTS will be using 0.22.0 by Mozer 2024-04-15 13:57:48.283 | INFO | xtts_api_server.server::76 - Model: 'v2.0.2' starts to load,wait until it loads [2024-04-15 13:58:01,165] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-04-15 13:58:01,457] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [2024-04-15 13:58:01,647] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.11.2+unknown, git-hash=unknown, git-branch=unknown [2024-04-15 13:58:01,648] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference [2024-04-15 13:58:01,648] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [2024-04-15 13:58:01,649] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 [2024-04-15 13:58:01,855] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False, 'num_kv': -1, 'rope_theta': 10000} 2024-04-15 13:58:02.325 | INFO | xtts_api_server.tts_funcs:load_model:190 - Pre-create latents for all current speakers 2024-04-15 13:58:02.326 | INFO | xtts_api_server.tts_funcs:create_latents_for_all:270 - Latents created for all 0 speakers. 2024-04-15 13:58:02.326 | INFO | xtts_api_server.tts_funcs:load_model:193 - Model successfully loaded C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\pydantic_internal_fields.py:160: UserWarning: Field "modelname" has conflict with protected namespace "model".

You may be able to resolve this warning by setting model_config['protected_namespaces'] = (). warnings.warn( INFO: Started server process [2164] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://localhost:8020 (Press CTRL+C to quit) voice Anna(speakers/Anna.wav) is not found, switching to 'default' 1713178894.8860521 in server request 2024-04-15 14:01:34.886 | INFO | xtts_api_server.server:tts_to_audio:337 - Processing TTS to audio with request: text='Что ты говоришь' speaker_wav='default' language='ru' reply_part=0 INFO: ::1:58595 - "POST /tts_to_audio/ HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\uvicorn\protocols\http\h11_impl.py", line 407, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\uvicorn\middleware\proxy_headers.py", line 69, in call return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\fastapi\applications.py", line 1054, in call await super().call(scope, receive, send) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 186, in call raise exc File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 164, in call await self.app(scope, receive, _send) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\cors.py", line 85, in call await self.app(scope, receive, send) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app raise exc File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 776, in app await route.handle(scope, receive, send) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 297, in handle await self.app(scope, receive, send) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app raise exc File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 72, in app response = await func(request) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\fastapi\routing.py", line 278, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\fastapi\routing.py", line 191, in run_endpoint_function return await dependant.call(**values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\server.py", line 347, in tts_to_audio output_file_path = XTTS.process_tts_to_file( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 609, in process_tts_to_file raise e # Propagate exceptions for endpoint handling. ^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 548, in process_tts_to_file speaker_wav = self.get_speaker_wav(speaker_name_or_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 540, in get_speaker_wav raise ValueError(f"Speaker {speaker_name_or_path} not found.") ValueError: Speaker default not found. voice Anna(speakers/Anna.wav) is not found, switching to 'default' 1713178895.9273908 in server request

Mozer commented 3 months ago

It can't find dir with speakers wavs. I think you are running it from cmd. Instead simply double click the xtts_wav2lip.bat or open a cmd from the directory where the bat is. XTTS finds \speakers\ dir based on current directory.

Gnoomer commented 3 months ago

explorer_v5ef0G1zyb explorer_hL2OEym3qW Here it is, I runned commands by copying them from .bat-s to the Conda environment, but now I added Conda in PATH, launched all from .bat-s, xtts downloaded some files, found 4 speakers, but kept getting the same issues. Here-s an output: W:\GOVORILKA\xtts>call conda activate xtts 2024-04-15 16:18:02.713 | INFO | xtts_api_server.tts_funcs:create_directories:283 - Folder in the path W:\GOVORILKA\xtts\xtts_models has been created 2024-04-15 16:18:02.715 | INFO | xtts_api_server.modeldownloader:upgrade_tts_package:80 - TTS will be using 0.22.0 by Mozer 2024-04-15 16:18:02.716 | INFO | xtts_api_server.server::76 - Model: 'v2.0.2' starts to load,wait until it loads [XTTS] Downloading config.json... 100%|████████████████████████████████████████████████████████████████████████████| 4.36k/4.36k [00:00<00:00, 4.36MiB/s] [XTTS] Downloading model.pth... 100%|████████████████████████████████████████████████████████████████████████████| 1.86G/1.86G [00:56<00:00, 32.7MiB/s] [XTTS] Downloading vocab.json... 100%|██████████████████████████████████████████████████████████████████████████████| 335k/335k [00:00<00:00, 1.01MiB/s] [XTTS] Downloading speakers_xtts.pth... 100%|████████████████████████████████████████████████████████████████████████████| 7.75M/7.75M [00:00<00:00, 22.7MiB/s] [2024-04-15 16:19:16,118] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-04-15 16:19:16,489] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [2024-04-15 16:19:16,689] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.11.2+unknown, git-hash=unknown, git-branch=unknown [2024-04-15 16:19:16,689] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference [2024-04-15 16:19:16,690] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [2024-04-15 16:19:16,690] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 [2024-04-15 16:19:16,915] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False, 'num_kv': -1, 'rope_theta': 10000} 2024-04-15 16:19:17.455 | INFO | xtts_api_server.tts_funcs:load_model:190 - Pre-create latents for all current speakers 2024-04-15 16:19:17.456 | INFO | xtts_api_server.tts_funcs:get_or_create_latents:259 - creating latents for Anna: speakers/Anna.wav 2024-04-15 16:19:19.815 | INFO | xtts_api_server.tts_funcs:get_or_create_latents:259 - creating latents for default: speakers/default.wav 2024-04-15 16:19:19.852 | INFO | xtts_api_server.tts_funcs:get_or_create_latents:259 - creating latents for Google: speakers/Google.wav 2024-04-15 16:19:19.921 | INFO | xtts_api_server.tts_funcs:get_or_create_latents:259 - creating latents for Kurt Cobain: speakers/Kurt Cobain.wav 2024-04-15 16:19:19.987 | INFO | xtts_api_server.tts_funcs:create_latents_for_all:270 - Latents created for all 4 speakers. 2024-04-15 16:19:19.987 | INFO | xtts_api_server.tts_funcs:load_model:193 - Model successfully loaded C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\pydantic_internal_fields.py:160: UserWarning: Field "modelname" has conflict with protected namespace "model".

You may be able to resolve this warning by setting model_config['protected_namespaces'] = (). warnings.warn( INFO: Started server process [16180] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://localhost:8020 (Press CTRL+C to quit) 1713187201.829053 in server request 2024-04-15 16:20:01.829 | INFO | xtts_api_server.server:tts_to_audio:337 - Processing TTS to audio with request: text='А ты что' speaker_wav='Anna' language='ru' reply_part=0

Free memory : 6.095509 (GigaBytes) Total memory: 11.993530 (GigaBytes) Requested memory: 0.335938 (GigaBytes) Setting maximum total tokens (input + output) to 1024 WorkSpace: 0000000790000000

INFO: ::1:63734 - "POST /tts_to_audio/ HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\uvicorn\protocols\http\h11_impl.py", line 407, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\uvicorn\middleware\proxy_headers.py", line 69, in call return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\fastapi\applications.py", line 1054, in call await super().call(scope, receive, send) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 186, in call raise exc File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\errors.py", line 164, in call await self.app(scope, receive, _send) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\cors.py", line 85, in call await self.app(scope, receive, send) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\middleware\exceptions.py", line 65, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app raise exc File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 756, in call await self.middleware_stack(scope, receive, send) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 776, in app await route.handle(scope, receive, send) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 297, in handle await self.app(scope, receive, send) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app raise exc File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\starlette\routing.py", line 72, in app response = await func(request) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\fastapi\routing.py", line 278, in app raw_response = await run_endpoint_function( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\fastapi\routing.py", line 191, in run_endpoint_function return await dependant.call(*values) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\server.py", line 347, in tts_to_audio output_file_path = XTTS.process_tts_to_file( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 609, in process_tts_to_file raise e # Propagate exceptions for endpoint handling. ^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 598, in process_tts_to_file self.local_generation(clear_text,speaker_name_or_path,speaker_wav,language,output_file) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\xtts_api_server\tts_funcs.py", line 495, in local_generation out = self.model.inference( ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\TTS\tts\models\xtts.py", line 699, in inference torchaudio.save(output_file, torch.tensor(wav_tensor).unsqueeze(0), 24000, encoding="PCM_U") File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\torchaudio_backend\utils.py", line 312, in save return backend.save( ^^^^^^^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\torchaudio_backend\soundfile.py", line 44, in save soundfile_backend.save( File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\torchaudio_backend\soundfile_backend.py", line 457, in save soundfile.write(file=filepath, data=src, samplerate=sample_rate, subtype=subtype, format=format) File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\soundfile.py", line 343, in write with SoundFile(file, 'w', samplerate, channels, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\soundfile.py", line 658, in init self._file = self._open(file, mode_int, closefd) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Hyena.conda\envs\xtts\Lib\site-packages\soundfile.py", line 1216, in _open raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name)) soundfile.LibsndfileError: Error opening 'C:\\Windows\\System32\\SillyTavern-Extras\\tts_out\\out_1.wav': System error. 1713187202.9234974 in server request

Mozer commented 3 months ago

I see C:\Windows\System32\SillyTavern-Extras\tts_out\out_1.wav

I don't think that you installed extras to /system32. Please edit xtts_wav2lip.bat and change --output param to full path where tts_out dir in extras is. Don't forget trailing slashes.

Gnoomer commented 3 months ago

I actually installed it there, now I moved it to the same folder with xtts and it all worked! Thanks a lot for your help!