Float 16 to Float 32 quantization

gowthm-r7 commented 4 months ago

I used to run this code on my machine and it worked without any issues before. I tried running it even before 4 days and it worked well. But now, I don't know what went wrong, I'm getting this warning message and also getting the output with huge latency.

[2024-06-26 14:48:06.524] [ctranslate2] [thread 21444] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead. [2024-06-26 14:48:11.765] [ctranslate2] [thread 3964] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.

My pc runs with RTX 3050 gpu and i5 12th gen processor and it used to work well with minimal latency before. I neither updated my pc nor my graphic driver. Please help me resolve this issue

KoljaB commented 4 months ago

The warning can be ignored and should not be the issue. Can't really tell what is going on, didn't experience this so far. Most probably low VRAM due to other VRAM consuming applications running and the model runs slow or on CPU then. Does this also happen after a restart of your PC with no other apps consuming VRAM?

gowthm-r7 commented 4 months ago

Same thing occurs. I even changed the default driver from intel uhd to rtx 3050 in nvidia control panel, and chose visual studio and python to use my gpu in graphics tab of my settings, rather than auto-selecting. All my drivers are up to date. I tried uninstalling all the dependencies (cuda, cudnn, etc.) and did it again as I'm doin it fresh. No change. When I run the program, the real-time vram usage was just 4-7 percent. Only my ram usage went up till 80 percent. Both my gpu's usage (intel uhd and rtx 3050) is just 2-4 percent.

I don't get why this thing occurs.

KoljaB commented 4 months ago

Let's check if your system is correctly recognizing and utilizing the GPU. Please run the following diagnostic code and share the output:

import torch
import ctranslate2

# Check if CUDA is available
print(f"CUDA available: {torch.cuda.is_available()}")

# Get the number of available GPUs
print(f"Number of GPUs: {torch.cuda.device_count()}")

# Get the name of the current GPU
if torch.cuda.is_available():
    print(f"Current GPU: {torch.cuda.get_device_name(0)}")

# Create a sample tensor and move it to GPU
x = torch.rand(5, 3)
if torch.cuda.is_available():
    x = x.cuda()
    print(f"Tensor is on GPU: {x.is_cuda}")
else:
    print("Tensor is on CPU")

# Check the device of the tensor
print(f"Device: {x.device}")

# Perform a simple operation to test GPU usage
if torch.cuda.is_available():
    start = torch.cuda.Event(enable_timing=True)
    end = torch.cuda.Event(enable_timing=True)

    start.record()
    result = torch.matmul(x, x.t())
    end.record()

    # Waits for everything to finish running
    torch.cuda.synchronize()

    print(f"GPU computation time: {start.elapsed_time(end)} milliseconds")

# Check CTranslate2 device info
print(f"CTranslate2 GPU supported types: {ctranslate2.get_supported_compute_types('cuda')}")

In the output there should be the lines "CUDA available: True", "Tensor is on GPU: True" and it should list some GPU supported types for CTranslate2 (like this):

CUDA available: True
Number of GPUs: 1
Current GPU: NVIDIA GeForce RTX 2080 SUPER
Tensor is on GPU: True
Device: cuda:0
GPU computation time: 26.155040740966797 milliseconds
CTranslate2 GPU supported types: {'float32', 'float16', 'int8_float16', 'int8', 'int8_float32'}

gowthm-r7 commented 4 months ago

This is what I get as output.

CUDA available: False Number of GPUs: 0 Tensor is on CPU Device: cpu CTranslate2 GPU supported types: {'float32', 'int8_float32', 'int8_float16', 'int8_bfloat16', 'float16', 'bfloat16', 'int8'}

But, when i run 'nvcc --version' on cmd prompt on cmd prompt to know the cuda version isntalled, it gives the output as nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0

I don't know why the cuda available says false.

KoljaB commented 4 months ago

To ensure that PyTorch is installed with CUDA support, first verify your installed PyTorch version:

pip show torch

If it shows version 2.1.2, then install the CUDA-compatible version by running:

pip install torch==2.1.2+cu118 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118

This command reinstalls PyTorch and Torchaudio with explicit support for CUDA, specifically for CUDA version 11.8.

gowthm-r7 commented 4 months ago

Name: torch Version: 2.3.0 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3 Location: C:\Users\gowth\Desktop\STS 4THSEM FAT\Lib\site-packages Requires: filelock, fsspec, jinja2, mkl, networkx, sympy, typing-extensions Required-by: openai-whisper, RealTimeSTT, torchaudio, torchvision

I have pytorch version 2.3.0. Now, should I install cuda- compatible version by running this ? pip install torch==2.3.0+cu118 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu118

KoljaB commented 4 months ago

Yes, exactly.

gowthm-r7 commented 4 months ago

I've installed it. But I got this error message at the end. ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. torchvision 0.18.1+cu118 requires torch==2.3.1+cu118, but you have torch 2.3.0+cu118 which is incompatible.

KoljaB commented 4 months ago

In this case I'd try:

pip install torch==2.3.1+cu118 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/cu118

gowthm-r7 commented 4 months ago

Now, I'm getting this error after installing. ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. realtimestt 0.1.16 requires torch==2.3.0, but you have torch 2.3.1+cu118 which is incompatible. realtimestt 0.1.16 requires torchaudio==2.3.0, but you have torchaudio 2.3.1+cu118 which is incompatible.

KoljaB commented 4 months ago

This can be ignored, it should work nevertheless. It only appears because RealtimeSTT has a fixed torch version number. The other alternative would be to stick with pip install torch==2.3.0+cu118 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu118 and downgrade torchvision to a version number that is compatible to torch==2.3.0 (I'm currently unsure which version that would be).

KoljaB commented 4 months ago

If you retry the system check code it should now display "CUDA available: True" and also RealtimeSTT should work fast now.

gowthm-r7 commented 4 months ago

Yes. The output of the system check code is now, CUDA available: True Number of GPUs: 1 Current GPU: NVIDIA GeForce RTX 3050 4GB Laptop GPU Tensor is on GPU: True Device: cuda:0 GPU computation time: 36.12160110473633 milliseconds CTranslate2 GPU supported types: {'int8_float16', 'int8_float32', 'float16', 'int8_bfloat16', 'bfloat16', 'int8', 'float32'}

Also, when I run the realtimestt browser_client code, I'm not getting any warning message. But, I'm getting this error while I start speaking.

RealTimeSTT: root - ERROR - Unhandled exeption in _realtime_worker: Library cublas64_12.dll is not found or cannot be loaded Exception in thread Thread-4 (_realtime_worker): Traceback (most recent call last): File "C:\Users\gowth\Desktop\STS 4THSEM FAT\Lib\threading.py", line 1073, in _bootstrap_inner self.run() File "C:\Users\gowth\Desktop\STS 4THSEM FAT\Lib\threading.py", line 1010, in run self._target(*self._args, **self._kwargs) File "C:\Users\gowth\Desktop\STS 4THSEM FAT\Lib\site-packages\RealtimeSTT\audio_recorder.py", line 1303, in _realtime_worker self.realtime_transcription_text = " ".join( ^^^^^^^^^ File "C:\Users\gowth\Desktop\STS 4THSEM FAT\Lib\site-packages\RealtimeSTT\audio_recorder.py", line 1303, in <genexpr> self.realtime_transcription_text = " ".join( ^ File "C:\Users\gowth\Desktop\STS 4THSEM FAT\Lib\site-packages\faster_whisper\transcribe.py", line 573, in generate_segments encoder_output = self.encode(segment) ^^^^^^^^^^^^^^^^^^^^ File "C:\Users\gowth\Desktop\STS 4THSEM FAT\Lib\site-packages\faster_whisper\transcribe.py", line 824, in encode return self.model.encode(features, to_cpu=to_cpu) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Library cublas64_12.dll is not found or cannot be loaded ERROR:root:General transcription error: Library cublas64_12.dll is not found or cannot be loaded RealTimeSTT: root - ERROR - Library cublas64_12.dll is not found or cannot be loaded Exception in thread Thread-1 (recorder_thread): Traceback (most recent call last): File "C:\Users\gowth\Desktop\STS 4THSEM FAT\Lib\threading.py", line 1073, in _bootstrap_inner self.run() File "C:\Users\gowth\Desktop\STS 4THSEM FAT\Lib\threading.py", line 1010, in run self._target(*self._args, **self._kwargs) File "C:\Users\gowth\gr_viscode\whisper\static\server.py", line 180, in recorder_thread full_sentence = recorder.text() ^^^^^^^^^^^^^^^ File "C:\Users\gowth\Desktop\STS 4THSEM FAT\Lib\site-packages\RealtimeSTT\audio_recorder.py", line 894, in text return self.transcribe() ^^^^^^^^^^^^^^^^^ File "C:\Users\gowth\Desktop\STS 4THSEM FAT\Lib\site-packages\RealtimeSTT\audio_recorder.py", line 853, in transcribe raise Exception(result) Exception: Library cublas64_12.dll is not found or cannot be loaded

KoljaB commented 4 months ago

Search for the cublas64_11.dll and create a copy in the same folder and name it cublas64_12.dll. Torch messed up their references with 2.2.2.

KoljaB commented 4 months ago

Should be somewhere at C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin (on Windows)

KoljaB commented 4 months ago

Underlying reason is that CTranslate2 changed and now in the default version it needs CUDA 12. faster_whisper updated their references to use that newer CTranslate2 version. So another solution would be to downgrade CTranslate2 version to "3.24.0" by this command:

pip install --upgrade --force-reinstall ctranslate2==3.24.0

Just copying cublas64_11.dll to cublas64_12.dll might be cleaner though.

gowthm-r7 commented 4 months ago

Yess. Now it's working perfectly. Thanks a lot :)

Will I face any problems like this again in future? Or what should be done to avoid these issues?

KoljaB commented 4 months ago

Great to hear that your setup is now working!

"Will I face any problems like this again in future? Or what should be done to avoid these issues?" If you don't touch the environment if won't happen again. Sometimes using a virtual environment helps isolating your environment against others. If you install other libraries in the same environment, this could in rare cases cause problems.

Reason is that the nature of Python creates an inherently unstable environment. Even with fixed versions in requirements this does not ensure stability, as transitive dependencies - libraries that the direct dependencies of RealtimeSTT rely on - may update independently, potentially leading to incompatibilities or disruptions. This indirect dependency instability can introduce breaking changes over time. So can never be fully guaranteed that a release of RealtimeSTT even with fixed library versions in the requirements.txt would not break in future installations.

gowthm-r7 commented 4 months ago

Okay, got it. Thanks for your rapid responses. Kudos :)

KoljaB / RealtimeSTT

Float 16 to Float 32 quantization #78