Closed kirahman2 closed 3 weeks ago
Check #260
Is it necessary to use torch 1.13? Anyways, there is a conflict between ctranslate2 and pytorch, please check faster-whisper repo, in the last issue you'll find a compatibility matrix, try and let me know how it goes
Is it necessary to use torch 1.13? Anyways, there is a conflict between ctranslate2 and pytorch, please check faster-whisper repo, in the last issue you'll find a compatibility matrix, try and let me know how it goes
@MahmoudAshraf97 torch 1.13 is required actually. Cuda 11.6 fixes the issue where the kernel die issue for whisperx. It's installed with the command below. Cuda 11.6 can't be installed with torch 2.0.0 due to the PyTorch library compatability issues. The below commands are from https://pytorch.org/get-started/previous-versions/
CUDA 11.6
This command fixed the issue with the kernel dying.
pip install torch==1.13.0+cu116 torchvision==0.14.0+cu116 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu116
If I run this command against the python environment created by the whisper-diarization repo, there are conflicts. Here's my shell output.
khalid@khalid-Precision-7920-Tower:~$ source whisper_env3/bin/activate
(whisper_env3) khalid@khalid-Precision-7920-Tower:~$ pip install torch==1.13.0+cu116 torchvision==0.14.0+cu116 torchaudio==0.13.0 --extra-index-url https://download.pytorch.org/whl/cu116
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu116
Collecting torch==1.13.0+cu116
Downloading https://download.pytorch.org/whl/cu116/torch-1.13.0%2Bcu116-cp310-cp310-linux_x86_64.whl (1983.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 GB 24.1 MB/s eta 0:00:00
Collecting torchvision==0.14.0+cu116
Downloading https://download.pytorch.org/whl/cu116/torchvision-0.14.0%2Bcu116-cp310-cp310-linux_x86_64.whl (24.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24.2/24.2 MB 52.9 MB/s eta 0:00:00
Collecting torchaudio==0.13.0
Downloading https://download.pytorch.org/whl/cu116/torchaudio-0.13.0%2Bcu116-cp310-cp310-linux_x86_64.whl (4.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.2/4.2 MB 45.8 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions in ./whisper_env3/lib/python3.10/site-packages (from torch==1.13.0+cu116) (4.12.2)
Requirement already satisfied: numpy in ./whisper_env3/lib/python3.10/site-packages (from torchvision==0.14.0+cu116) (1.26.3)
Requirement already satisfied: requests in ./whisper_env3/lib/python3.10/site-packages (from torchvision==0.14.0+cu116) (2.32.3)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in ./whisper_env3/lib/python3.10/site-packages (from torchvision==0.14.0+cu116) (10.2.0)
Requirement already satisfied: charset-normalizer<4,>=2 in ./whisper_env3/lib/python3.10/site-packages (from requests->torchvision==0.14.0+cu116) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in ./whisper_env3/lib/python3.10/site-packages (from requests->torchvision==0.14.0+cu116) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./whisper_env3/lib/python3.10/site-packages (from requests->torchvision==0.14.0+cu116) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in ./whisper_env3/lib/python3.10/site-packages (from requests->torchvision==0.14.0+cu116) (2024.8.30)
Installing collected packages: torch, torchvision, torchaudio
Attempting uninstall: torch
Found existing installation: torch 1.13.1+cu116
Uninstalling torch-1.13.1+cu116:
Successfully uninstalled torch-1.13.1+cu116
Attempting uninstall: torchvision
Found existing installation: torchvision 0.14.1+cu116
Uninstalling torchvision-0.14.1+cu116:
Successfully uninstalled torchvision-0.14.1+cu116
Attempting uninstall: torchaudio
Found existing installation: torchaudio 0.13.1+cu116
Uninstalling torchaudio-0.13.1+cu116:
Successfully uninstalled torchaudio-0.13.1+cu116
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lightning 2.4.0 requires torch<4.0,>=2.1.0, but you have torch 1.13.0+cu116 which is incompatible.
pyannote-audio 3.1.1 requires torch>=2.0.0, but you have torch 1.13.0+cu116 which is incompatible.
pyannote-audio 3.1.1 requires torchaudio>=2.0.0, but you have torchaudio 0.13.0+cu116 which is incompatible.
pytorch-lightning 2.4.0 requires torch>=2.1.0, but you have torch 1.13.0+cu116 which is incompatible.
whisperx 3.1.1 requires torch>=2, but you have torch 1.13.0+cu116 which is incompatible.
whisperx 3.1.1 requires torchaudio>=2, but you have torchaudio 0.13.0+cu116 which is incompatible.
Successfully installed torch-1.13.0+cu116 torchaudio-0.13.0+cu116 torchvision-0.14.0+cu116
(whisper_env3) khalid@khalid-Precision-7920-Tower:~$
For torch 2.0.0, here are the options. Notice how there's no option for CUDA 11.6 for torch 2.0.0. These commands were pulled directly from https://pytorch.org/get-started/previous-versions/
CUDA 11.7
pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1
CUDA 11.8
pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118
I will give the matrix a try tomorrow and create a new py env with cuda 12.1
Torch Version | CT2 Version |
---|---|
2..+cu121 | <=4.4.0 |
2..+cu124 | >=4.5.0 |
>=2.4.0 | >=4.5.0 |
<2.4.0 | <4.5.0 |
I upgraded to cuda 12.1 and ran the installation accordingly. I was able to confirm cuda 12.1 from my notebook. Here are the commands I ran.
source: https://pytorch.org/get-started/previous-versions/
CUDA 12.1
pip install torch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 --index-url https://download.pytorch.org/whl/cu121
source: https://github.com/SYSTRAN/faster-whisper/issues/1086#issuecomment-2435924887
pip install faster-whisper ctranslate2==4.4.0
I ran into the same issue where the kernel dies. Here is the shell output.
[I 2024-10-24 20:26:57.588 ServerApp] Restoring connection for 59aba1d2-0cfe-408d-8e36-0544cb64ddee:44a0da2d-1890-413f-973d-f77c378c1c36
[I 2024-10-24 20:27:08.522 ServerApp] Saving file at /work/whisper-diarization/Whisper_Transcription_+_NeMo_Diarization.ipynb
[W 2024-10-24 20:27:13.799 ServerApp] 404 GET /api/.cache/torch/whisperx-vad-segmentation.bin?content=0&hash=0&1729819633648 (3b0f0f4d690a490f8e1b1d6254c68218@127.0.0.1) 36.71ms referer=http://localhost:8889/lab/tree/work/whisper-diarization/Whisper_Transcription_%2B_NeMo_Diarization.ipynb
[I 2024-10-24 20:27:18.536 ServerApp] AsyncIOLoopKernelRestarter: restarting kernel (1/5), keep random ports
[W 2024-10-24 20:27:18.537 ServerApp] kernel 59aba1d2-0cfe-408d-8e36-0544cb64ddee restarted
[I 2024-10-24 20:27:18.574 ServerApp] Starting buffering for 59aba1d2-0cfe-408d-8e36-0544cb64ddee:44a0da2d-1890-413f-973d-f77c378c1c36
[I 2024-10-24 20:27:18.589 ServerApp] Connecting to kernel 59aba1d2-0cfe-408d-8e36-0544cb64ddee.
[I 2024-10-24 20:27:18.589 ServerApp] Restoring connection for 59aba1d2-0cfe-408d-8e36-0544cb64ddee:44a0da2d-1890-413f-973d-f77c378c1c36
This is the notebook cell and output
compute_type = "float16"
# or run on GPU with INT8
# compute_type = "int8_float16"
# or run on CPU with INT8
# compute_type = "int8"
whisper_results, language, audio_waveform = transcribe_batched(
vocal_target,
language,
batch_size,
whisper_model_name,
compute_type,
suppress_numerals,
device,
)
OUTPUT
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.4.0. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../../.cache/torch/whisperx-vad-segmentation.bin`
No language specified, language will be first be detected for each audio file (increases inference time).
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.4.1+cu121. Bad things might happen unless you revert torch to 1.x.
[NeMo W 2024-10-24 20:27:14 nemo_logging:349] [/home/khalid/whisper_env2/lib/python3.10/site-packages/pyannote/audio/utils/reproducibility.py:74](http://localhost:8889/lab/tree/work/whisper-diarization/whisper_env2/lib/python3.10/site-packages/pyannote/audio/utils/reproducibility.py#line=73): ReproducibilityWarning: TensorFloat-32 (TF32) has been disabled as it might lead to reproducibility issues and lower accuracy.
It can be re-enabled by calling
>>> import torch
>>> torch.backends.cuda.matmul.allow_tf32 = True
>>> torch.backends.cudnn.allow_tf32 = True
See https://github.com/pyannote/pyannote-audio/issues/1370 for more details.
warnings.warn(
I find it odd that my local ubuntu 20.04 instance has this issue, but google colab doesn't. This seems like challenging issue, I appreciate you!
It's challenging indeed! but the problem boils down to conflicting cuda installations, I suggest you start with a fresh environment and install the requirements as mentioned in the readme, this is guaranteed to work and tested weekly on several python versions
I'm ensuring my ubuntu 20.04 environment has only 1 version of cuda installed so I ran a clean uninstall + a fresh install of CUDA 12.4, but I'm still running into the issue with the kernel dying. Is it possible that I'm installing CUDA incorrectly? Or maybe the wrong CUDA version?
# remove cuda - from nvidias website
# https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#removing-cuda-toolkit
sudo apt-get --purge remove "*cuda*" "*cublas*" "*cufft*" "*cufile*" "*curand*" \
"*cusolver*" "*cusparse*" "*gds-tools*" "*npp*" "*nvjpeg*" "nsight*" "*nvvm*" --yes
sudo apt-get autoremove --purge -V --yes
# https://developer.nvidia.com/cuda-12-4-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubun>
# instal CUDA 12.4
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-4
ENV_NAME=whisper_env10_cuda124
python3.10 -m venv ~/$ENV_NAME
source ~/$ENV_NAME/bin/activate
pip install --upgrade pip
pip install ipykernel
pip install jupyter-lab
python -m ipykernel install --user --name $ENV_NAME --display-name "$ENV_NAME"
# mahmoud whisper commands
sudo apt update && sudo apt install cython3 --yes
sudo apt update && sudo apt install ffmpeg --yes
pip install -c constraints.txt -r requirements.txt
# pytorch CUDA 12.4
# https://pytorch.org/get-started/locally/
pip3 install torch torchvision torchaudio
# CUDA 12.4
pip install ctranslate2==4.5.0
Here are some other details on my env
(whisper_env10_cuda124) khalid@khalid-Precision-7920-Tower:~/work/whisper-diarization$ pip list | grep -i "cuda"
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
(whisper_env10_cuda124) khalid@khalid-Precision-7920-Tower:~/work/whisper-diarization$ pip list | grep -i "torch"pytorch-lightning 2.4.0
pytorch-metric-learning 2.6.0
torch 2.5.0
torch-audiomentations 0.11.1
torch_pitch_shift 1.2.5
torchaudio 2.5.0
torchmetrics 1.5.1
torchvision 0.20.0
(whisper_env10_cuda124) khalid@khalid-Precision-7920-Tower:~/work/whisper-diarization$ whereis cuda
cuda: /usr/local/cuda
I didn't install nvidia-smi or nvcc because one of them results in a CUDA 10.1 installation, which could create issues with conflicting CUDA versions.
(whisper_env10_cuda124) khalid@khalid-Precision-7920-Tower:~/work/whisper-diarization$ nvidia-smi
Command 'nvidia-smi' not found, but can be installed with:
sudo apt install nvidia-340 # version 340.108-0ubuntu5.20.04.2, or
sudo apt install nvidia-utils-390 # version 390.157-0ubuntu0.20.04.1
sudo apt install nvidia-utils-450-server # version 450.248.02-0ubuntu0.20.04.1
sudo apt install nvidia-utils-470 # version 470.256.02-0ubuntu0.20.04.1
sudo apt install nvidia-utils-470-server # version 470.256.02-0ubuntu0.20.04.1
sudo apt install nvidia-utils-535 # version 535.183.01-0ubuntu0.20.04.1
sudo apt install nvidia-utils-535-server # version 535.183.06-0ubuntu0.20.04.1
sudo apt install nvidia-utils-550-server # version 550.90.07-0ubuntu0.20.04.2
sudo apt install nvidia-utils-435 # version 435.21-0ubuntu7
sudo apt install nvidia-utils-440 # version 440.82+really.440.64-0ubuntu6
sudo apt install nvidia-utils-418-server # version 418.226.00-0ubuntu0.20.04.2
(whisper_env10_cuda124) khalid@khalid-Precision-7920-Tower:~/work/whisper-diarization$ nvcc --version
Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit
You did install two cudas, one using apt and another using pip On a fresh Ubuntu just install tge requirements and cuda will be installed along torch, you just might need to include their directories in PATH
@MahmoudAshraf97 Thanks for updating the code to use fast-whisper. I just tried installing the tge requirements to a brand new docker container with ubuntu 22.04. I didn't install CUDA from nvidia's website. I let the tge requirements handle cuda. I noticed that
torch.cuda.is_available() returns true torch.version.cuda returns 12.4 However the kernel still dies.
In addition to that, I noticed when I type whereis cuda
in bash, there is no path for cuda so there is no export path to add that I know of unless maybe I'm missing a step.
Since I'm running a container, I've already configured
I'm also running the commands as
# if you are installing a newly hosted ubuntu image then run
# these commands so that docker can access the GPU
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nano /etc/docker/daemon.json
# add this to daemon.json
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
# run docker container
sudo docker run --gpus all -it -p 8888:8888 \
-v /home/khalid/work:/container/work \
-v /home/khalid/whisper_data:/container/whisper_data \
ubuntu:22.04 /bin/bash
# run jupyter
jupyter lab --ip=0.0.0.0 --port=8888 --allow-root --no-browser
When you do weekly testing against multiple python versions, what version of ubuntu are you using?
If you are going to use docker, just use a prebuilt image that has cuda set up, check nvidia or torch inage repos
@MahmoudAshraf97 This was resolved by using a Dockerfile that I pulled from your repo here https://github.com/SYSTRAN/faster-whisper/tree/master/docker. Thanks for the assistance!
I am running Ubuntu 20.04 with CUDA 12.1. I've been troubleshooting this issue for just over a week and I think I figured out the issue.
When I run the notebook cell
I get this output in my terminal
I recalled having this issue when I was using WSL 2 Ubuntu 22.04. I resolved the issue by installing CUDA 11.7.
Since then I stopped using WSL and installed Ubuntu 20.04 directly to my workstation. Now I'm seeing the same "kernel died" error. I isolated the issue to faster whisper within
Whisper_Transcription_+_NeMo_Diarization.ipynb
, specifically in the functionI created a new python environment and installed faster whisper using the read me doc in https://github.com/SYSTRAN/faster-whisper. I used the code example from the readme and ran into the same "kernel died" error.
I then ran the command to install CUDA 11.6
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
The CUDA 11.6 installation fixed the "kernel died" error. If I try to install any CUDA version that is newer than this, the kernel dies. I'm posting here because the whisper-diarization repo requires torch >= 2.0.0 and torch 2.0.0 can only be ran with the CUDA 11.7 installation command below.
pip install torch==2.0.0 torchvision==0.15.1 torchaudio==2.0.1
How would I go about getting this to work with the whisper-diarization repo? Is there an older version of the repo that supports torch==1.13.1+cu116 that I can use?