CUDNN_STATUS_VERSION_MISMATCH

TheGermanEngie commented 1 year ago

Hello again -

I built on my past errors and installed the correct version of CUDA to get past where I was stuck at previously. However, I seem to have run into a new problem, specifically in the wav2vec/NeMo stage:

(diarize) omen@omen-PC:~/AI/whisper-diarization$ python diarize.py -a /home/omen/jre_elon.m4a --whisper-model medium.en --device cuda[NeMo W 2023-07-26 15:44:49 optimizers:54] Apex was not found. Using the lamb or fused_adam optimizer will error out.
[NeMo W 2023-07-26 15:44:50 experimental:27] Module <class 'nemo.collections.asr.modules.audio_modules.SpectrogramToMultichannelFeatures'> is experimental, not ready for production and is not fully supported. Use at your own risk.
Downloading: "https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th" to /home/omen/.cache/torch/hub/checkpoints/955717e8-8726e21a.th
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80.2M/80.2M [00:05<00:00, 15.0MB/s]
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /home/omen/AI/whisper-diarization/temp_outputs/htdemucs
Separating track /home/omen/jre_elon.m4a
Killed
WARNING:root:Source splitting failed, using original audio file. Use --no-stem argument to disable it.
Downloading (…)1d2350ce/config.json: 100%|█████████████████████████████████████████████████████████████████████████████| 2.64k/2.64k [00:00<00:00, 818kB/s]
Downloading (…)350ce/tokenizer.json: 100%|████████████████████████████████████████████████████████████████████████████| 2.13M/2.13M [00:00<00:00, 11.9MB/s]
Downloading (…)350ce/vocabulary.txt: 100%|██████████████████████████████████████████████████████████████████████████████| 422k/422k [00:00<00:00, 2.31MB/s]
Downloading model.bin: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 1.53G/1.53G [01:43<00:00, 14.8MB/s]
Downloading: "https://download.pytorch.org/torchaudio/models/wav2vec2_fairseq_base_ls960_asr_ls960.pth" to /home/omen/.cache/torch/hub/checkpoints/wav2vec2_fairseq_base_ls960_asr_ls960.pth
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 360M/360M [00:25<00:00, 14.6MB/s]
[NeMo W 2023-07-26 16:07:55 nemo_logging:349] /home/omen/AI/whisper-diarization/diarize.py:104: UserWarning: PySoundFile failed. Trying audioread instead.
      signal, sample_rate = librosa.load(vocal_target, sr=None)

[NeMo W 2023-07-26 16:07:55 nemo_logging:349] /home/omen/miniconda3/envs/diarize/lib/python3.9/site-packages/librosa/core/audio.py:184: FutureWarning: librosa.core.audio.__audioread_load
        Deprecated as of librosa version 0.10.0.                                                                                                                              
        It will be removed in librosa version 1.0.                                                                                                                            
      y, sr_native = __audioread_load(path, offset, duration, dtype)                                                                                                          

100% [................................................................................] 7336 / 7336[NeMo I 2023-07-26 16:08:11 msdd_models:1092] Loading pretrained diar_msdd_telephonic model from NGC
[NeMo I 2023-07-26 16:08:11 cloud:68] Downloading from: https://api.ngc.nvidia.com/v2/models/nvidia/nemo/diar_msdd_telephonic/versions/1.0.1/files/diar_msdd_telephonic.nemo to /home/omen/.cache/torch/NeMo/NeMo_1.17.0/diar_msdd_telephonic/3c3697a0a46f945574fa407149975a13/diar_msdd_telephonic.nemo
100% [......................................................................] 107609008 / 107609008[NeMo I 2023-07-26 16:08:19 common:913] Instantiating model from pre-trained checkpoint
[NeMo W 2023-07-26 16:08:20 modelPT:161] If you intend to do training or fine-tuning, please call the ModelPT.setup_training_data() method and provide a valid configuration file to setup the train data loader.
    Train config : 
    manifest_filepath: null
    emb_dir: null
    sample_rate: 16000
    num_spks: 2
    soft_label_thres: 0.5
    labels: null
    batch_size: 15
    emb_batch_size: 0
    shuffle: true

[NeMo W 2023-07-26 16:08:20 modelPT:168] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s). 
    Validation config : 
    manifest_filepath: null
    emb_dir: null
    sample_rate: 16000
    num_spks: 2
    soft_label_thres: 0.5
    labels: null
    batch_size: 15
    emb_batch_size: 0
    shuffle: false

[NeMo W 2023-07-26 16:08:20 modelPT:174] Please call the ModelPT.setup_test_data() or ModelPT.setup_multiple_test_data() method and provide a valid configuration file to setup the test data loader(s).
    Test config : 
    manifest_filepath: null
    emb_dir: null
    sample_rate: 16000
    num_spks: 2
    soft_label_thres: 0.5
    labels: null
    batch_size: 15
    emb_batch_size: 0
    shuffle: false
    seq_eval_mode: false

[NeMo I 2023-07-26 16:08:20 features:287] PADDING: 16
[NeMo I 2023-07-26 16:08:20 features:287] PADDING: 16
Traceback (most recent call last):
  File "/home/omen/AI/whisper-diarization/diarize.py", line 111, in <module>
    msdd_model = NeuralDiarizer(cfg=create_config(temp_path)).to(args.device)
  File "/home/omen/miniconda3/envs/diarize/lib/python3.9/site-packages/nemo/collections/asr/models/msdd_models.py", line 991, in __init__
    self._init_msdd_model(cfg)
  File "/home/omen/miniconda3/envs/diarize/lib/python3.9/site-packages/nemo/collections/asr/models/msdd_models.py", line 1093, in _init_msdd_model
    self.msdd_model = EncDecDiarLabelModel.from_pretrained(model_name=model_path, map_location=cfg.device)
  File "/home/omen/miniconda3/envs/diarize/lib/python3.9/site-packages/nemo/core/classes/common.py", line 852, in from_pretrained
    instance = class_.restore_from(
  File "/home/omen/miniconda3/envs/diarize/lib/python3.9/site-packages/nemo/core/classes/modelPT.py", line 436, in restore_from
    instance = cls._save_restore_connector.restore_from(
  File "/home/omen/miniconda3/envs/diarize/lib/python3.9/site-packages/nemo/core/connectors/save_restore_connector.py", line 239, in restore_from
    loaded_params = self.load_config_and_state_dict(
  File "/home/omen/miniconda3/envs/diarize/lib/python3.9/site-packages/nemo/core/connectors/save_restore_connector.py", line 163, in load_config_and_state_dict
    instance = instance.to(map_location)
  File "/home/omen/miniconda3/envs/diarize/lib/python3.9/site-packages/lightning_fabric/utilities/device_dtype_mixin.py", line 54, in to
    return super().to(*args, **kwargs)
  File "/home/omen/miniconda3/envs/diarize/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/home/omen/miniconda3/envs/diarize/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/omen/miniconda3/envs/diarize/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/omen/miniconda3/envs/diarize/lib/python3.9/site-packages/torch/nn/modules/rnn.py", line 202, in _apply
    self._init_flat_weights()
  File "/home/omen/miniconda3/envs/diarize/lib/python3.9/site-packages/torch/nn/modules/rnn.py", line 139, in _init_flat_weights
    self.flatten_parameters()
  File "/home/omen/miniconda3/envs/diarize/lib/python3.9/site-packages/torch/nn/modules/rnn.py", line 190, in flatten_parameters
    torch._cudnn_rnn_flatten_weight(
RuntimeError: cuDNN error: CUDNN_STATUS_VERSION_MISMATCH

I have installed: CUDA 11.8 Pytorch 2.0.1 for CUDA 11.x cuDNN 8.9.3.28 (July 11th release)

Also, I had to create a conda env with python 3.9 to install the dependencies. Maybe APIs and such are changing too fast and breaking things.

TheGermanEngie commented 1 year ago

On an unrelated note, I would like to add that your colab notebook does not work anymore either. It throws a "No module named 'wget'" error in the third cell of "Installing dependencies"

ModuleNotFoundError Traceback (most recent call last)

in <cell line: 2>() 1 import os ----> 2 import wget 3 from omegaconf import OmegaConf 4 import json 5 import shutil

ModuleNotFoundError: No module named 'wget'

filmo commented 1 year ago

I had the exact same issue. What worked for me was doing an absolutely clean install of Cuda 11.8 and cuDNN 8.7.0 following this gist

https://gist.github.com/MihailCosmin/affa6b1b71b43787e9228c25fe15aeba

See my comment on that gist regarding sudo apt upgrade in order to get it working on my Ubuntu 22.04 setup.

After doing so, I was able to run this repository.

MahmoudAshraf97 commented 1 year ago

Hi all, since pytorch is now installing cudnn using pip, it will conflict with any previous cudnn versions that you already installed using any other method, the solution is pip uninstall nvidia-cudnn-cu11

v-nhandt21 commented 10 months ago

Try conda install cudatoolkit=11.8 cudnn=8.9.2.26

MahmoudAshraf97 / whisper-diarization

CUDNN_STATUS_VERSION_MISMATCH #69