JarodMica / ai-voice-cloning

GNU General Public License v3.0
574 stars 128 forks source link

Error message #51

Open searcher12 opened 7 months ago

searcher12 commented 7 months ago

I trained a model and am getting the following error message when trying to generate TTS : "Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: 'tuple' object has no attribute 'device'" . i have clicked the (Re)Compute voice latents button but the error keeps on repeating and i cant seem to generate. The generation works with the "random" voice when i switch back to the original autoregressive model in the settings but it doesnt work with the new voice. Any help would be appreciated

JarodMica commented 7 months ago

Error: 'tuple' object has no attribute 'device'

This should get resolved by generating new voice latents, are you restarting TTS after changing autoregressive models as shown in the vid? If you could provide more of the command terminal on startup, that might help as well to figure out what's going on. Possible CUDA issues, what is your GPU?

searcher12 commented 7 months ago

C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning>set PYTHONUTF8=1

C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning>runtime\python.exe .\src\main.py 2024-02-20 11:32:36 | INFO | rvc.configs.config | Found GPU NVIDIA GeForce RTX 4080 Whisper detected Traceback (most recent call last): File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 98, in from vall_e.emb.qnt import encode as valle_quantize ModuleNotFoundError: No module named 'vall_e'

Traceback (most recent call last): File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 118, in import bark ModuleNotFoundError: No module named 'bark'

[textbox, textbox, radio, textbox, dropdown, audio, number, slider, number, slider, slider, slider, radio, slider, slider, slider, slider, slider, slider, slider, checkboxgroup, checkbox, checkbox] [dropdown, slider, dropdown, slider, slider, slider, slider, slider] Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Loading TorToiSe... (AR: ./training/Tom/finetune/models/101_gpt.pth, diffusion: ./models/tortoise/diffusion_decoder.pth, vocoder: bigvgan_24khz_100band) Hardware acceleration found: cuda use_deepspeed api_debug True C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") Loading tokenizer JSON: ./modules/tortoise-tts/tortoise/data/tokenizer.json Loaded tokenizer Loading autoregressive model: ./training/Tom/finetune/models/101_gpt.pth [2024-02-20 11:32:48,026] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.8.3+6eca037c, git-hash=6eca037c, git-branch=HEAD [2024-02-20 11:32:48,027] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [2024-02-20 11:32:48,027] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 WARNING! Setting BLOOMLayerPolicy._orig_layer_class to None due to Exception: module 'transformers.models' has no attribute 'bloom' [2024-02-20 11:32:48,046] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'q_int8': False, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False} Loaded autoregressive model Loaded diffusion model Loading vocoder model: bigvgan_24khz_100band Loading vocoder model: bigvgan_24khz_100band.pth Removing weight norm... Loaded vocoder model Loaded TTS, ready for generation.

This is what i get when i start up the command terminal. i reinstalled the whole thing just incase i had messed up something by mistake. When i trained a voice like shown in the video and tried to generate, it now gives me this error: "Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor"

This is the command terminal:

2024-02-20 11:37:19 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK" 2024-02-20 11:37:19 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK" 2024-02-20 11:37:31 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK" [1/1] Generating line: Hello, my name is tom. Iam from the Islamic republic of pakistan Loading voice: Tom with model 8d20e2ec Loading voice: Tom 2024-02-20 11:37:31 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK" Reading from latent: ./voices/Tom//cond_latents_8d20e2ec.pth {'temperature': 0.2, 'top_p': 0.8, 'diffusion_temperature': 1.0, 'length_penalty': 5.03, 'repetition_penalty': 5.95, 'cond_free_k': 2.25, 'num_autoregressive_samples': 4, 'sample_batch_size': 4, 'diffusion_iterations': 112, 'voice_samples': None, 'conditioning_latents': (tensor([[-2.0328, 0.5989, 0.7558, ..., 0.4945, 2.8155, 1.8861]]), tensor([[-1.0792, -1.1542, -0.6089, ..., -0.0314, 0.0723, 0.2199]]), tensor([[[[1.6425, 1.6425, 1.6425, ..., 1.6425, 1.6425, 1.6425], [1.8926, 1.8926, 1.8926, ..., 1.8926, 1.8926, 1.8926], [2.4791, 2.4791, 2.4791, ..., 2.4791, 2.4791, 2.4791], ..., [1.2285, 1.2285, 1.2285, ..., 1.2285, 1.2285, 1.2285], [1.2226, 1.2226, 1.2226, ..., 1.2226, 1.2226, 1.2226], [1.2142, 1.2142, 1.2142, ..., 1.2142, 1.2142, 1.2142]],

     [[1.6425, 1.6425, 1.6425,  ..., 1.6425, 1.6425, 1.6425],
      [1.8926, 1.8926, 1.8926,  ..., 1.8926, 1.8926, 1.8926],
      [2.4791, 2.4791, 2.4791,  ..., 2.4791, 2.4791, 2.4791],
      ...,
      [1.2285, 1.2285, 1.2285,  ..., 1.2285, 1.2285, 1.2285],
      [1.2226, 1.2226, 1.2226,  ..., 1.2226, 1.2226, 1.2226],
      [1.2142, 1.2142, 1.2142,  ..., 1.2142, 1.2142, 1.2142]],

     [[1.3076, 1.4206, 1.6425,  ..., 1.6425, 1.6425, 1.6425],
      [1.6435, 1.7735, 1.5302,  ..., 1.8926, 1.8926, 1.8926],
      [2.3152, 2.3541, 1.9468,  ..., 2.4791, 2.4791, 2.4791],
      ...,
      [1.2285, 1.2285, 1.2067,  ..., 1.2285, 1.2285, 1.2285],
      [1.2226, 1.2226, 1.2226,  ..., 1.2226, 1.2226, 1.2226],
      [1.2142, 1.2142, 1.2142,  ..., 1.2142, 1.2142, 1.2142]],

     ...,

     [[0.7421, 0.9486, 1.4548,  ..., 1.6425, 1.6425, 1.6425],
      [0.7732, 0.9418, 1.2630,  ..., 1.8926, 1.8926, 1.8926],
      [0.6820, 1.0005, 1.3891,  ..., 2.4791, 2.4791, 2.4791],
      ...,
      [1.0580, 1.0902, 0.9891,  ..., 1.2285, 1.2285, 1.2285],
      [1.2080, 1.1955, 1.0867,  ..., 1.2226, 1.2226, 1.2226],
      [1.0465, 1.0885, 1.0928,  ..., 1.2142, 1.2142, 1.2142]],

     [[1.6217, 1.4854, 1.4787,  ..., 1.6425, 1.6425, 1.6425],
      [1.5391, 1.3550, 1.2815,  ..., 1.8926, 1.8926, 1.8926],
      [2.0342, 1.6249, 1.5823,  ..., 2.4791, 2.4791, 2.4791],
      ...,
      [1.0962, 1.0606, 1.0592,  ..., 1.2285, 1.2285, 1.2285],
      [1.1523, 1.1778, 1.2080,  ..., 1.2226, 1.2226, 1.2226],
      [1.1018, 1.1122, 1.1327,  ..., 1.2142, 1.2142, 1.2142]],

     [[1.0799, 1.1469, 1.3332,  ..., 1.3428, 1.4276, 0.5720],
      [1.0039, 1.0236, 1.2014,  ..., 1.2535, 1.2570, 0.5889],
      [0.6524, 0.5970, 0.5753,  ..., 1.8267, 1.7102, 0.6581],
      ...,
      [1.2285, 1.2285, 1.2285,  ..., 0.5055, 0.5005, 0.5035],
      [1.2226, 1.2226, 1.2226,  ..., 0.5097, 0.5665, 0.6035],
      [1.2142, 1.2142, 1.2142,  ..., 0.4824, 0.5564, 0.6863]]]]), None), 'use_deterministic_seed': None, 'return_deterministic_state': True, 'k': 1, 'diffusion_sampler': 'P', 'breathing_room': 10, 'half_p': False, 'cond_free': True, 'cvvp_amount': 0.4, 'autoregressive_model': './training/Tom/finetune/models/101_gpt.pth', 'diffusion_model': './models/tortoise/diffusion_decoder.pth', 'tokenizer_json': './modules/tortoise-tts/tortoise/data/tokenizer.json'}

Free memory : 10.523438 (GigaBytes) Total memory: 15.991699 (GigaBytes) Requested memory: 1.687500 (GigaBytes) Setting maximum total tokens (input + output) to 1024

Traceback (most recent call last): File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 1235, in generate_tortoise gen, additionals = tts.tts(cut_text, settings ) File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\src\tortoise\api.py", line 804, in tts cvvp_accumulator = cvvp_accumulator + self.cvvp(auto_conds[:, cl].repeat(batch.shape[0], 1, 1), batch, return_loss=False) File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(args, kwargs) File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\src\tortoise\models\cvvp.py", line 115, in forward cond_emb = self.cond_emb(mel_cond).permute(0, 2, 1) File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\nn\modules\container.py", line 215, in forward input = module(input) File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(args, **kwargs) File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\nn\modules\conv.py", line 310, in forward return self._conv_forward(input, self.weight, self.bias) File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\nn\modules\conv.py", line 306, in _conv_forward return F.conv1d(input, weight, bias, self.stride, RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\gradio\routes.py", line 394, in run_predict output = await app.get_blocks().process_api( File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\gradio\blocks.py", line 1075, in process_api result = await self.call_function( File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\gradio\blocks.py", line 884, in call_function prediction = await anyio.to_thread.run_sync( File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\anyio\to_thread.py", line 33, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread return await future File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\anyio_backends_asyncio.py", line 807, in run result = context.run(func, args) File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\gradio\helpers.py", line 587, in tracked_fn response = fn(args) File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\src\webui.py", line 129, in generate_proxy raise e File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\src\webui.py", line 123, in generate_proxy sample, outputs, stats = generate(kwargs) File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 364, in generate return generate_tortoise(kwargs) File "C:\Users\ikras\OneDrive\Documents\Ai voice cloning\Tortoise TTS\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 1238, in generate_tortoise raise RuntimeError(f'Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: {e}') RuntimeError: Possible latent mismatch: click the "(Re)Compute Voice Latents" button and then try again. Error: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor 2024-02-20 11:37:34 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 500 Internal Server Error" 2024-02-20 11:37:34 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"

In the voices folder, for the voice i trained, when i delete its "cond_latents" file, the TTS allows me to generate once without an error, but when i try to generate again, it again shows the previous error. if i delete the "cond_latents" file again then it allows me to generate once again before showing an error

I've restarted the TTS and followed everything like in the video but an error always occurs after training a model. I have an RTX 4080.