benchmark performance (Broadcast News)

beeldengeluid / dane-whisper-asr-worker

MIT License

2 stars 0 forks source link

benchmark performance (Broadcast News) #37

Closed Veldhoen closed 2 months ago

Veldhoen commented 2 months ago

benchmark with the same setup for fair comparison:

[ ] faster-whisper
[ ] Whisper-X
[ ] Huggingface transformers-implementation
[ ] original OpenAI (baseline)

greenw0lf commented 2 months ago

Parameters to experiment with that apply to all implementations:

float16 vs. float32 (whether there's a huge WER difference between the two, if not it is preferred to use float16 as it is faster and consumes less GPU memory)
large-v2 vs. large-v3

greenw0lf commented 2 months ago

Labelled data benchmarked, unlabelled data still to go

greenw0lf commented 2 months ago

Small mistake: loading the diarization model for each file instead of just once for WhisperX (alignment model needs to be loaded per file as it uses language info)

greenw0lf commented 2 months ago

Re-benchmarking faster-whisper (both labelled and unlabelled) because time to transcribe should be measured from the point when model.transcribe is run until the output of the function is saved to a JSON file.

Also re-benchmarking WhisperX for the reason mentioned in the previous comment (expected less time spent per file)