First prompt is very fast, after that the prompt hangs or takes several times longer without using any GPU

Hello, I am not sure if this is a bug or something with my hardware. But it seems as though sometime Tortoise TTS hangs after 1 successful generation. Using Deepseed, Ultra Fast preset, and an RVC generation (although from CMD output it doesn't even seem to be reaching RVC the second time) it can takes several minutes or sometimes infinitely hangs. I noticed my RAM is almost completely maxed but my GPU is not being used at all (on low VRAM mode). This is odd because other TTS/RVC webuis seem to run fine. Also other GPU heavy webuis such as Oobabooga's Text Generation WebUI run fine.
Specs: OS: Windows 10 64-Bit CPU: Ryzen 9 5900HS GPU: RTX 3060 Mobile (6GB VRAM) RAM: 16GB DDR4
Here is my CMD output for the entire session:


C:\Users\username\Documents\ai-voice-cloning-v2_0\ai-voice-cloning>runtime\python.exe .\src\main.py
2024-03-30 12:39:14 | INFO | rvc.configs.config | Found GPU NVIDIA GeForce RTX 3060 Laptop GPU
Whisper detected
Traceback (most recent call last):
  File "C:\Users\username\Documents\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 98, in <module>
    from vall_e.emb.qnt import encode as valle_quantize
ModuleNotFoundError: No module named 'vall_e'

Traceback (most recent call last):
  File "C:\Users\username\Documents\ai-voice-cloning-v2_0\ai-voice-cloning\src\utils.py", line 118, in <module>
    import bark
ModuleNotFoundError: No module named 'bark'

!WARNING! Automatically deduced sample batch size returned 1.
!WARNING! Automatically deduced sample batch size returned 1.
[textbox, textbox, radio, textbox, dropdown, audio, number, slider, number, slider, slider, slider, radio, slider, slider, slider, slider, slider, slider, slider, checkboxgroup, checkbox, checkbox]
[dropdown, slider, dropdown, slider, slider, slider, slider, slider]
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Loading TorToiSe... (AR: C:\Users\username\Documents\ai-voice-cloning-v2_0\ai-voice-cloning\models\tortoise\autoregressive.pth, diffusion: ./models/tortoise/diffusion_decoder.pth, vocoder: bigvgan_24khz_100band)
Hardware acceleration found: cuda
use_deepspeed api_debug True
C:\Users\username\Documents\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Loading tokenizer JSON: ./modules/tortoise-tts/tortoise/data/tokenizer.json
Loaded tokenizer
Loading autoregressive model: C:\Users\username\Documents\ai-voice-cloning-v2_0\ai-voice-cloning\models\tortoise\autoregressive.pth
[2024-03-30 12:39:31,737] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.8.3+6eca037c, git-hash=6eca037c, git-branch=HEAD
[2024-03-30 12:39:31,740] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead
[2024-03-30 12:39:31,741] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
WARNING! Setting BLOOMLayerPolicy._orig_layer_class to None due to Exception: module 'transformers.models' has no attribute 'bloom'
[2024-03-30 12:39:31,831] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'fp16': False, 'pre_layer_norm': True, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'q_int8': False, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False}
Loaded autoregressive model
Loaded diffusion model
Loading vocoder model: bigvgan_24khz_100band
Loading vocoder model: bigvgan_24khz_100band.pth
Removing weight norm...
Loaded vocoder model
Loaded TTS, ready for generation.
2024-03-30 12:39:56 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-30 12:39:56 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
2024-03-30 12:39:56 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-30 12:39:56 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-30 12:39:56 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
2024-03-30 12:39:56 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
2024-03-30 12:40:41 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-30 12:40:41 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
2024-03-30 12:40:49 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-30 12:40:49 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
2024-03-30 12:40:55 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-30 12:40:55 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
2024-03-30 12:41:01 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-30 12:41:01 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
2024-03-30 12:41:04 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-30 12:41:04 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
2024-03-30 12:41:19 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[1/1] Generating line: [I am really angry,] This is an RVC test prompt. I am testing longer generation. RVC is being used.
Loading voice: Myself Demo with model d1f79232
2024-03-30 12:41:19 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
Loading voice: Myself Demo
Reading from latent: ./voices/Myself Demo//cond_latents_d1f79232.pth
{'temperature': 0.2, 'top_p': 0.8, 'diffusion_temperature': 1.0, 'length_penalty': 1.0, 'repetition_penalty': 6.0, 'cond_free_k': 2.0, 'num_autoregressive_samples': 2, 'sample_batch_size': 1, 'diffusion_iterations': 50, 'voice_samples': None, 'conditioning_latents': (tensor([[ 5.5797e-04,  2.8545e+00,  4.8068e+00,  ..., -1.5647e+00,
         -1.7763e+00, -5.7516e-01]]), tensor([[-0.8919, -0.8043, -0.6150,  ..., -0.0942, -0.0069,  0.1361]])), 'use_deterministic_seed': None, 'return_deterministic_state': True, 'k': 1, 'diffusion_sampler': 'DDIM', 'breathing_room': 8, 'half_p': False, 'cond_free': True, 'cvvp_amount': 0, 'autoregressive_model': 'C:\\Users\\username\\Documents\\ai-voice-cloning-v2_0\\ai-voice-cloning\\models\\tortoise\\autoregressive.pth', 'diffusion_model': './models/tortoise/diffusion_decoder.pth', 'tokenizer_json': './modules/tortoise-tts/tortoise/data/tokenizer.json'}
------------------------------------------------------
Free memory : 2.631836 (GigaBytes)
Total memory: 5.999512 (GigaBytes)
Requested memory: 0.421875 (GigaBytes)
Setting maximum total tokens (input + output) to 1024
------------------------------------------------------
Generating line took 50.12170481681824 seconds
C:\Users\username\Documents\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torchaudio\functional\functional.py:1371: UserWarning: "kaiser_window" resampling method name is being deprecated and replaced by "sinc_interp_kaiser" in the next release. The default behavior remains unchanged.
  warnings.warn(
models\rvc_models\guard.pth
C:\Users\username\Documents\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
C:\Users\username\Documents\ai-voice-cloning-v2_0\ai-voice-cloning\runtime\lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")

File finished writing to: C:\Users\username\Documents\ai-voice-cloning-v2_0\ai-voice-cloning\output\out.wav
Generation took 50.935757637023926 seconds, saved to './results//Myself Demo//Myself Demo_00005.wav'

2024-03-30 12:42:16 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-30 12:42:16 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
2024-03-30 12:42:17 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
2024-03-30 12:42:17 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
2024-03-30 12:42:36 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/api/predict "HTTP/1.1 200 OK"
[1/1] Generating line: [I am really angry,] This is an RVC test prompt. I am testing longer generation. RVC is being used.
2024-03-30 12:42:37 | INFO | httpx | HTTP Request: POST http://127.0.0.1:7860/reset "HTTP/1.1 200 OK"
Loading voice: Myself Demo with model d1f79232
Loading voice: Myself Demo
Reading from latent: ./voices/Myself Demo//cond_latents_d1f79232.pth
{'temperature': 0.2, 'top_p': 0.8, 'diffusion_temperature': 1.0, 'length_penalty': 1.0, 'repetition_penalty': 6.0, 'cond_free_k': 2.0, 'num_autoregressive_samples': 2, 'sample_batch_size': 1, 'diffusion_iterations': 50, 'voice_samples': None, 'conditioning_latents': (tensor([[ 5.5797e-04,  2.8545e+00,  4.8068e+00,  ..., -1.5647e+00,
         -1.7763e+00, -5.7516e-01]]), tensor([[-0.8919, -0.8043, -0.6150,  ..., -0.0942, -0.0069,  0.1361]])), 'use_deterministic_seed': None, 'return_deterministic_state': True, 'k': 1, 'diffusion_sampler': 'DDIM', 'breathing_room': 8, 'half_p': False, 'cond_free': True, 'cvvp_amount': 0, 'autoregressive_model': 'C:\\Users\\username\\Documents\\ai-voice-cloning-v2_0\\ai-voice-cloning\\models\\tortoise\\autoregressive.pth', 'diffusion_model': './models/tortoise/diffusion_decoder.pth', 'tokenizer_json': './modules/tortoise-tts/tortoise/data/tokenizer.json'}```
JarodMica / ai-voice-cloning

First prompt is very fast, after that the prompt hangs or takes several times longer without using any GPU #71