JarodMica / ai-voice-cloning

GNU General Public License v3.0
424 stars 86 forks source link

Tensor error #121

Open Supercar2018 opened 3 weeks ago

Supercar2018 commented 3 weeks ago

Ryzen 7 2700x GTX 1080 16GB DDR4 CUDA 12.5 I always get this error whenever I click generate.

Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 2 in the list.

Even after I click recompute voice latents, I still get the same error.

full details here

[1/1] Generating line: Your prompt here. Loading voice: random with model d1f79232 {'temperature': 0.2, 'top_p': 0.8, 'diffusion_temperature': 1.0, 'length_penalty': 1.0, 'repetition_penalty': 2.0, 'cond_free_k': 2.0, 'num_autoregressive_samples': 96, 'sample_batch_size': 2, 'diffusion_iterations': 80, 'voice_samples': None, 'conditioning_latents': (tensor([[-0.8391, 1.4016, 1.3422, ..., 2.5526, 0.1944, 4.2360]]), tensor([[-1.2027, -1.1897, -0.7371, ..., -0.1231, -0.2029, 0.0296]])), 'use_deterministic_seed': None, 'return_deterministic_state': True, 'k': 1, 'diffusion_sampler': 'DDIM', 'breathing_room': 8, 'half_p': False, 'cond_free': True, 'cvvp_amount': 0, 'autoregressive_model': None, 'diffusion_model': None, 'tokenizer_json': None} Traceback (most recent call last): File "D:\TTS AI\ai-voice-cloning\src\utils.py", line 1247, in generate_tortoise gen, additionals = tts.tts(cut_text, settings ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\modules\tortoise-tts\tortoise\api.py", line 869, in tts best_latents = self.autoregressive(auto_conditioning.repeat(k, 1), text_tokens.repeat(k, 1), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\modules\tortoise-tts\tortoise\models\autoregressive.py", line 500, in forward text_logits, mel_logits = self.get_logits(conds, text_emb, self.text_head, mel_emb, self.mel_head, get_attns=return_attentions, return_latent=return_latent) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\modules\tortoise-tts\tortoise\models\autoregressive.py", line 419, in get_logits emb = torch.cat([speech_conditioning_inputs, first_inputs, second_inputs], dim=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 96 for tensor number 2 in the list.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\gradio\queueing.py", line 501, in call_prediction output = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\gradio\route_utils.py", line 258, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\gradio\blocks.py", line 1710, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\gradio\blocks.py", line 1250, in call_function prediction = await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\anyio_backends_asyncio.py", line 2177, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\anyio_backends_asyncio.py", line 859, in run result = context.run(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\gradio\utils.py", line 693, in wrapper response = f(args, kwargs) ^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\gradio\utils.py", line 693, in wrapper response = f(args, kwargs) ^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\src\webui.py", line 134, in generate_proxy raise e File "D:\TTS AI\ai-voice-cloning\src\webui.py", line 128, in generate_proxy sample, outputs, stats = generate(kwargs) ^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\src\utils.py", line 368, in generate return generate_tortoise(kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\src\utils.py", line 1250, in generate_tortoise raise RuntimeError(f'Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: {e}') RuntimeError: Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 96 for tensor number 2 in the list. [1/1] Generating line: Your prompt here. Loading voice: random with model d1f79232 {'temperature': 0.2, 'top_p': 0.8, 'diffusion_temperature': 1.0, 'length_penalty': 1.0, 'repetition_penalty': 2.0, 'cond_free_k': 2.0, 'num_autoregressive_samples': 2, 'sample_batch_size': 2, 'diffusion_iterations': 30, 'voice_samples': None, 'conditioning_latents': (tensor([[ 0.9920, 1.7382, -2.2513, ..., 0.7594, 4.1982, 3.8239]]), tensor([[-0.9937, -0.7718, -0.9986, ..., -0.2338, 0.2262, 0.2678]])), 'use_deterministic_seed': None, 'return_deterministic_state': True, 'k': 1, 'diffusion_sampler': 'DDIM', 'breathing_room': 8, 'half_p': False, 'cond_free': True, 'cvvp_amount': 0, 'autoregressive_model': None, 'diffusion_model': None, 'tokenizer_json': None} Traceback (most recent call last): File "D:\TTS AI\ai-voice-cloning\src\utils.py", line 1247, in generate_tortoise gen, additionals = tts.tts(cut_text, settings ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\modules\tortoise-tts\tortoise\api.py", line 869, in tts best_latents = self.autoregressive(auto_conditioning.repeat(k, 1), text_tokens.repeat(k, 1), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\modules\tortoise-tts\tortoise\models\autoregressive.py", line 500, in forward text_logits, mel_logits = self.get_logits(conds, text_emb, self.text_head, mel_emb, self.mel_head, get_attns=return_attentions, return_latent=return_latent) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\modules\tortoise-tts\tortoise\models\autoregressive.py", line 419, in get_logits emb = torch.cat([speech_conditioning_inputs, first_inputs, second_inputs], dim=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 2 in the list.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\gradio\queueing.py", line 501, in call_prediction output = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\gradio\route_utils.py", line 258, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\gradio\blocks.py", line 1710, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\gradio\blocks.py", line 1250, in call_function prediction = await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\anyio_backends_asyncio.py", line 2177, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\anyio_backends_asyncio.py", line 859, in run result = context.run(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\gradio\utils.py", line 693, in wrapper response = f(args, *kwargs) ^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\venv\Lib\site-packages\gradio\utils.py", line 693, in wrapper response = f(args, kwargs) ^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\src\webui.py", line 134, in generate_proxy raise e File "D:\TTS AI\ai-voice-cloning\src\webui.py", line 128, in generate_proxy sample, outputs, stats = generate(kwargs) ^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\src\utils.py", line 368, in generate return generate_tortoise(**kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\TTS AI\ai-voice-cloning\src\utils.py", line 1250, in generate_tortoise raise RuntimeError(f'Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: {e}') RuntimeError: Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: Sizes of tensors must match except in dimension 1. Expected size 1 but got size 2 for tensor number 2 in the list.

Talhavaival1 commented 3 weeks ago

Same issue with me as well.

Joshua-Shepherd commented 2 weeks ago

Hey guys, so I think this may have been resolved now, Jarod elaborated here: https://github.com/JarodMica/ai-voice-cloning/issues/120#issuecomment-2159225523

Can you try and reinstall, then try again?