argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
http://argmaxinc.com/blog/whisperkit
MIT License
3.92k stars 330 forks source link

Instructions for running the cli version? #140

Closed jrp2014 closed 6 months ago

jrp2014 commented 6 months ago

Is the word whisperkit-cli missing from the README?

swift run whisperkit-cli transcribe --model-path "Models/whisperkit-coreml/openai_whisper-large-v3" --audio-path ~/.cache/whisper/alice.mp3 
Building for debugging...
[1/1] Write swift-version--58304C5D6DBC2206.txt
Build complete! (0.09s)

If I don't include it, I get error: no executable product named 'transcribe'.

Transcription seems to be pretty slow, with no use of the GPU.

The output is a wall of text, with some capitalisation anomalies.

Using the mlx whisper, you can add timestamps to the output, so that if two people are speaking, the transcript starts each change of speaker on a new line. Is the same capability available here?

I'm not sure what MP3 formats are supported? I got a Error when transcribing /Users/xxx.mp3: loadAudioFailed("Unable to resample audio") from a stereo 44.1 kHz .mp3 file.

I'm not sure whether I'm using the large-v3 for 30s clips, or the one for full length transcripts.

ZachNagengast commented 6 months ago

It's mentioned in the readme here: https://github.com/argmaxinc/WhisperKit?tab=readme-ov-file#swift-cli

Did you see somewhere else that had swift run transcribe? We will update if you can point us to it.

Regarding the timestamps, we do have a parameter clipTimestamps in the swift library, but it's not currently in the CLI, making a note to get that brought over.

The mp3 resample bug you posted is interesting, I've yet to see this error, are you able to provide the audio file you used so we can debug?

jrp2014 commented 6 months ago

Thanks. The README. now seems to be corrected.

I'm sorry that I can't share the mp3. Perhaps the app ran out of memory as the clip is quite long.

ZachNagengast commented 6 months ago

Ok, if you can replicate it with a file you can share let us know. Memory seems like a good candidate, will see if there is a better error message we can give there.