[Bug] Assertion srcIndex < srcSelectDimSize

Describe the bug

Assertion srcIndex < srcSelectDimSize failed. RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasLtMatmul with transpose_mat1 0 transpose_mat2 0 m 4096 n 108 k 1024 mat1_ld 4096 mat2_ld 1024 result_ld 4096 abcType 0 computeType 68 scaleType 0 CUDA Error Details: RuntimeError: CUDA error: device-side assert triggered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. this is the error i'm facing when i deploy model with Fastapi , if i send request with 300words twice , frist request processes fine then 2nd request triggers this error. it says its error with GPU memory but my memory never peaked to or reached the limit. tts = TTS(model_path="./models/xtts", config_path='./models/xtts/config.json').to(device) this is how im uploading model tts.tts_to_file(text=text, speaker_wav=f"./voices/{voice}", language=language, file_path=output_file) this is how i am generating file or am i loading or using it wrong , or should i limit my word limit

To Reproduce

this is the error i'm facing when i deploy model with Fastapi , if i send request with 600words twice , first request processes fine then 2nd request triggers this error. it says its error with GPU memory but my memory never peaked to or reached the limit.

Expected behavior

No response

Logs

../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [3,0,0], thread: [93,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [3,0,0], thread: [94,0,0] Assertion srcIndex < srcSelectDimSize failed. ../aten/src/ATen/native/cuda/Indexing.cu:1236: indexSelectSmallIndex: block: [3,0,0], thread: [95,0,0] Assertion srcIndex < srcSelectDimSize failed. RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasLtMatmul with transpose_mat1 0 transpose_mat2 0 m 4096 n 108 k 1024 mat1_ld 4096 mat2_ld 1024 result_ld 4096 abcType 0 computeType 68 scaleType 0 CUDA Error Details: RuntimeError: CUDA error: device-side assert triggered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

CUDA Error Details: ERROR:root:RuntimeError during TTS generation: CUDA error: device-side assert triggered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Environment

-XTTS 2.2
-cuda 11.8.0
-python 3.8
-ubuntu 22.04
-pip3 install torch==2.3.1+cu118 torchaudio==2.3.1+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
-GPU T4 or 3060Ti

Additional context

No response

coqui-ai / TTS