Closed Veldhoen closed 2 months ago
Parameters to experiment with that apply to all implementations:
Labelled data benchmarked, unlabelled data still to go
Small mistake: loading the diarization model for each file instead of just once for WhisperX (alignment model needs to be loaded per file as it uses language info)
Re-benchmarking faster-whisper (both labelled and unlabelled) because time to transcribe should be measured from the point when model.transcribe
is run until the output of the function is saved to a JSON file.
Also re-benchmarking WhisperX for the reason mentioned in the previous comment (expected less time spent per file)
benchmark with the same setup for fair comparison: