elixir-nx / bumblebee

Pre-trained Neural Network models in Axon (+ 🤗 Models integration)
Apache License 2.0
1.26k stars 90 forks source link

Confidences/probabilities for Whisper results #335

Open zacharygraber opened 4 months ago

zacharygraber commented 4 months ago

Hi friends 👋. Bumblebee is an amazing project, and I'm excited about the prospect of integrating it into my Phoenix LiveView web app.

Description of Problem

speech_to_text_whisper_chunk only supports the raw text, start time, and stop time for that chunk as outputs. There is nothing comparable to (or at least no easy way to replicate) the per-segment avg_logprob that the Python-native Whisper API gives you.

Opportunity Statement (example use case)

AI-generated transcripts are getting better, but still often need to be cleaned by a human if you want to use them in a professional or research setting. Human cleaning of transcripts can be performed much more efficiently if attention can be directed to the places where the model was the least confident with its solution.

For example, I'd like to use the confidences/probabilities to return transcripts to users in a .docx format, where tokens/segments with low confidence are highlighted.