Error converting audio for NeMo compatibility?

ZQ-Dev8 commented 1 year ago

Hello Mahmoud. First off, thank you for this awesome contribution to the community. I've been trying to get reliable diarization with whisper for months, so I'm excited to try your implementation.

However, I'm running into the following error when running your diarize.py script on a random audio file from youtube:

(MA97_whisper_diarization) PS C:\-\-\-\-\repos\whisper-diarization> python diarize.py -a ..\..\audio\lotr_trailer.wav
[NeMo W 2023-02-07 16:30:09 optimizers:55] Apex was not found. Using the lamb or fused_adam optimizer will error out.
[NeMo W 2023-02-07 16:30:10 nemo_logging:349] C:\-\-\Miniconda3\envs\MA97_whisper_diarization\lib\site-packages\torch\jit\annotations.py:309: UserWarning: TorchScript will treat type annotations of Tensor dtype-specific subtypes as if they are normal Tensors. dtype constraints are not enforced in compilation either.
      warnings.warn("TorchScript will treat type annotations of Tensor "

Python was not found; run without arguments to install from the Microsoft Store, or disable this shortcut from Settings > Manage App Execution Aliases.
Source splitting failed, using original audio file. Use --no-stem argument to disable it.
  0%| 0/17569 [00:00 
15%|███████████▋| 2572/17569 [00:28<02:48
28%|██████████████████████| 4960/17569 [00:32<01:11, 1 
45%|███████████████████████████████████| 7832/17569 [00:35<00:33, 2 
59%|█████████████████████████████████████████████▊| 10328/17569 [00:39<00:19, 3 
74%|█████████████████████████████████████████████████████████ | 13002/17569 [00:46<00:12, 3 
91%|███████████████████████████████████████████████████████████████████████ | 16002/17569 [00:49<00:03, 
4100%██████████████████████████████████████████████████████████████████████████████| 17569/17569 [00:50<00:00, 
5100%|██████████████████████████████████████████████████████████████████████████████| 17569/17569 [00:50<00:00, 346.50frames/s]
Downloading: "https://download.pytorch.org/torchaudio/models/wav2vec2_fairseq_base_ls960_asr_ls960.pth" to C:\-\-/.cache\torch\hub\checkpoints\wav2vec2_fairseq_base_ls960_asr_ls960.pth
100%|███████████████████████████████████████████████████████████████████████| 360M/360M [00:05<00:00, 69.7MB/s]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\-\-\-\-\repos\whisper-diarization\diarize.py:115 in       │
│ <module>                                                                                         │
│                                                                                                  │
│   112                                                                                            │
│   113 # convert audio to mono for NeMo combatibility                                             │
│   114 signal, sample_rate = librosa.load(vocal_target, sr=None)                                  │
│ ❱ 115 os.chdir("temp_outputs")                                                                   │
│   116 soundfile.write("mono_file.wav", signal, sample_rate, "PCM_24")                            │
│   117                                                                                            │
│   118 # Initialize NeMo MSDD diarization model                                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'temp_outputs'

Any recommendations for a fix?

MahmoudAshraf97 commented 1 year ago

Hi @dcruiz01 , I updated the script yesterday to fix this problem, so please download the latest version and let me know if it still exists

ZQ-Dev8 commented 1 year ago

Hey @MahmoudAshraf97. I pulled the latest script and tried it, but got the same error. I managed to fix it by replacing line 115 with:

if not os.path.exists(temp_path):
    os.mkdir(temp_path)

This got me past the error, but now I have a different error with an enormous trace. The error starts with \lib\multiprocessing\spawn.py in the environment I built for this. Have you seen this before?

Let me know if you want me to open a pull request to fix the temp_path error above.

MahmoudAshraf97 commented 1 year ago

@dcruiz01 no need for the pull request I implemented this fix, thanks can you show me a gist of your new error?

ZQ-Dev8 commented 1 year ago

@MahmoudAshraf97 thank you for your patience, here's a link to the gist showing my error on both Windows and Linux. I would really like to get this project functioning, so any assistance you can provide would be awesome.

ZQ-Dev8 commented 1 year ago

I think I finally got it working. The solution was two-fold. First, I had to switch to windows to linux, which should have been obvious (?). Second, I had some dependencies that weren't covered by your requirements.txt file. Namely, I had to install Cython via pip, and gcc, g++ via apt-get. You might consider updating your documentation for those working out of a fresh linux image, like I was. Closing this up now.

MahmoudAshraf97 / whisper-diarization

Error converting audio for NeMo compatibility? #2