This PR integrates WhisperX to our current pipeline. WhisperX enables batch computation which leads to faster inference. The current status is x25 realtime speed.
Major changes:
WhisperX integration
GPU based sbatch job submission
Optimize sbatch time duration instead of default 24h
Optimize batch_size based on GPU available memory to decrease inference time and avoiding CUDA OutOfMemory error.
Using ffmpeg in a subprocess for faster input file conversion
Parallelizing Pynnoate diarization and 'WhisperX` transcription for 2x pipeline speed
Required Files
WhisperX requires this VAD model to be available in TORCH_HOME directory. TORCH_HOME can be accessed via torch.hub._get_torch_home() and in speech2text module is located at /scratch/shareddata/speech2text
Environment Variables
For SPEECH2TEXT_CPUS_PER_TASK, 6 is enough as we are using
This PR integrates WhisperX to our current pipeline. WhisperX enables batch computation which leads to faster inference. The current status is x25 realtime speed.
Major changes:
batch_size
based on GPU available memory to decrease inference time and avoiding CUDA OutOfMemory error.ffmpeg
in a subprocess for faster input file conversionPynnoate
diarization and 'WhisperX` transcription for 2x pipeline speedRequired Files WhisperX requires this VAD model to be available in
TORCH_HOME
directory.TORCH_HOME
can be accessed viatorch.hub._get_torch_home()
and inspeech2text
module is located at/scratch/shareddata/speech2text
Environment Variables For
SPEECH2TEXT_CPUS_PER_TASK
, 6 is enough as we are using