Closed jkrukowski closed 7 months ago
@ZachNagengast thanks for your comments, added some changes, lmk what you think
I'm curious about the
Transcriber
protocol, can you elaborate on your use case for that?
In order to do the transcription in AudioStreamTranscriber
I'd to need to pass the whole WhisperKit
class there. I'd rather pass an object that contains just the methods I need and Transcriber
protocol could be a 1st step towards that. I imagine that there is a separate class that implements the Transcriber
protocol and contains the transcribe
methods that currently are in WhisperKit
. This way I could:
Transcriber
object to AudioStreamTranscriber
and not the whole WhisperKit
classTranscriber
with other "streamers" (e.g. RemoteAudioStreamTranscriber
) in the futureTranscriber
in testsWhisperKit
class smaller and more focused on being an entry point for the librarySince we have this new
AudioWarper
here now, do you think there's any recording code in AudioProcessor that would fit into here as well? Would be nice to have a few debug logs fromLogging.debug
in this section as well.
added more logging, changed name to AudioStreamTranscriber
One other thing: I think the microphone streaming should be explicit in
swift run transcribe --model-path "Models/whisperkit-coreml/openai_whisper-large-v3"
such as a--stream
boolean argument. Reason for that is ideally giving people a heads up if they forgot to include--audio-path
, and only requesting the microphone if we are sure they want to stream.
added --stream
flag
This should allow significant cleanup for the example apps with this shared interface, but that can happen separately, nicely done!
I could work on this cleanup ofc
Alright tried this out and have just some minor tweaks to the UI:
--verbose
mode, makes it hard to use as a piped input to other CLI commands (like outputting stdout to a file). Instead, it could only print the new unconfirmed segments, or possibly even replace the current line with currentText
for a more live output. Lmk what you think.Everything else looks good, I'll try to help with this as well after word timestamps.
- When running initially, it should output some info about the model status to the cli, such as "Loading models..." etc
I'm using the model loading function from WhipserKit
should I change the Logging
to print
there? Or you had something else in mind?
- It would be ideal to find a way to not print to the CLI every loop when not in
--verbose
mode, makes it hard to use as a piped input to other CLI commands (like outputting stdout to a file). Instead, it could only print the new unconfirmed segments, or possibly even replace the current line withcurrentText
for a more live output. Lmk what you think.
changed the state change callback in AudioStreamTranscriber
so right now it's going to print only if currentText
, unconfirmed or confirmed segments has changed, would that be ok?
Great work @jkrukowski and @ZachNagengast! We can work on improving the output formatting next week on a separate PR
This PR:
ContentView.swift
AudioWarper
~AudioStreamTranscriber
to combine streaming audio from the microphone, processing it, and transcribing it in real-timeTranscriber
protocol to exposeWhisperKit
transcribe methods~
AudioWarper
~AudioStreamTranscriber
can replace the streaming logic in the example app as wellResolves: https://github.com/argmaxinc/WhisperKit/issues/25