EtienneAb3d / WhisperHallu

Experimental code: sound file preprocessing to optimize Whisper transcriptions without hallucinated texts
275 stars 22 forks source link

Should a GPU help this algorithm go faster or no? #19

Open jsteinberg-rbi opened 1 year ago

jsteinberg-rbi commented 1 year ago

So from what I've seen when the script runs it attempts to run as a GPU if one is present, which of course is great. In fact I think it's even the default. For whatever reason it doesn't run as GPU on my NVIDIA A100. I have no issues with running whisper ... --device cuda, it works great and reduces the runtime of my transcription by an order of magnitude. I wish I could get the same result with Hallu. What am I missing? Thanks! Let me know if you want any other information from me.

EtienneAb3d commented 1 year ago

WhisperHallu is using Whisper or FasterWisper out of the box, without any modification on them. I don't understand why you didn't get them using your GPU.

jsteinberg-rbi commented 1 year ago

@EtienneAb3d

Hey thanks for the prompt response! Er -- Whisper and FasterWhisper will use the GPU, but what about ffmpeg, demucs, etc -- are those going to take forever? I had figured that running your algorithm on a GPU would make all that "pre-processing" that prevents the Whisper hallucination go a lot faster? I'm using an NVIDIA A100 40GB.

Here's the log so far:

(base) root@instance-2:/home/jsteinberg/WhisperHallu# ls
README.md  data  demucsWrapper.py  hallu.py  markers  transcribeHallu.py
(base) root@instance-2:/home/jsteinberg/WhisperHallu# python hallu.py
Python >= 3.10
/opt/conda/lib/python3.10/site-packages/torch/hub.py:286: UserWarning: You are about to download and run code from an untrusted repository. In a future release, this won't be allowed. To add the repository to your trusted list, change the command to {calling_fn}(..., trust_repo=False) and a command prompt will appear asking for an explicit confirmation of trust, or load(..., trust_repo=True), which will assume that the prompt is to be answered with 'yes'. You can also use load(..., trust_repo='check') which will only prompt for confirmation if the repo is not already trusted. This will eventually be the default behaviour
  warnings.warn(
Downloading: "https://github.com/snakers4/silero-vad/zipball/master" to /root/.cache/torch/hub/master.zip
Using Demucs
Downloading: "https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th" to /root/.cache/torch/hub/checkpoints/955717e8-8726e21a.th
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80.2M/80.2M [00:00<00:00, 111MB/s]
/opt/conda/lib/python3.10/site-packages/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  def backtrace(trace: np.ndarray):
Using standard Whisper
LOADING: large-v2 GPU:0 BS: 2
100%|█████████████████████████████████████| 2.87G/2.87G [00:47<00:00, 65.2MiB/s]
LOADED
=====transcribePrompt
PATH=../230821_0020S12.wav
LNGINPUT=en
LNG=en
PROMPT=Whisper, Ok. A pertinent sentence for your purpose in your language. Ok, Whisper. Whisper, Ok. Ok, Whisper. Whisper, Ok. Please find here, an unlikely ordinary sentence. This is to avoid a repetition to be deleted. Ok, Whisper. 
CMD: ffmpeg -y -i "../230821_0020S12.wav"  -c:a pcm_s16le -ar 16000 "../230821_0020S12.wav.WAV.wav" > "../230821_0020S12.wav.WAV.wav.log" 2>&1
T= 10.130795001983643
PATH=../230821_0020S12.wav.WAV.wav
Demucs using device: cuda:0
Source: drums
Source: bass
Source: other
Source: vocals
T= 186.54959273338318
PATH=../230821_0020S12.wav.WAV.wav.vocals.wav
CMD: ffmpeg -y -i "../230821_0020S12.wav.WAV.wav.vocals.wav" -af "silenceremove=start_periods=1:stop_periods=-1:start_threshold=-50dB:stop_threshold=-50dB:start_silence=0.2:stop_silence=0.2, loudnorm"  -c:a pcm_s16le -ar 16000 "../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav" > "../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.log" 2>&1
T= 58.83332967758179
PATH=../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav
DURATION=7452
/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:1501: UserWarning: operator() profile_node %669 : int[] = prim::profile_ivalue(%667)
 does not have profile information (Triggered internally at ../third_party/nvfuser/csrc/graph_fuser.cpp:104.)
  return forward_call(*args, **kwargs)
T= 27.54055142402649
PATH=../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.VAD.wav
NOT USING MARKS FOR DURATION > 30s
[0] PATH=../230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.VAD.wav
jsteinberg-rbi commented 1 year ago

Wowza. I got it working.

EtienneAb3d commented 1 year ago

@jsteinberg-rbi Demucs should run GPU. I think this is not possible with ffmpeg, but perhaps there is a possibility I ignore, especially for some features. What did you do to get it working?

jsteinberg-rbi commented 1 year ago

@EtienneAb3d The file I was testing with initially was a 4GB file and it would just spin forever. When I switched to a 2GB it ran in under 10 minutes :)

Question for you: so I ran your script over 30 files last night. Which one of these files has the silence removed?

230821_0020S12.wav
230821_0020S12.wav.WAV.wav
230821_0020S12.wav.WAV.wav.bass.wav
230821_0020S12.wav.WAV.wav.drums.wav
230821_0020S12.wav.WAV.wav.log
230821_0020S12.wav.WAV.wav.other.wav
230821_0020S12.wav.WAV.wav.vocals.wav
230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav
230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.VAD.wav
230821_0020S12.wav.WAV.wav.vocals.wav.SILCUT.wav.log
EtienneAb3d commented 1 year ago

@jsteinberg-rbi SILCUT = Silence Cut