Closed jrp2014 closed 6 months ago
It's mentioned in the readme here: https://github.com/argmaxinc/WhisperKit?tab=readme-ov-file#swift-cli
Did you see somewhere else that had swift run transcribe
? We will update if you can point us to it.
Regarding the timestamps, we do have a parameter clipTimestamps
in the swift library, but it's not currently in the CLI, making a note to get that brought over.
The mp3 resample bug you posted is interesting, I've yet to see this error, are you able to provide the audio file you used so we can debug?
Thanks. The README. now seems to be corrected.
I'm sorry that I can't share the mp3. Perhaps the app ran out of memory as the clip is quite long.
Ok, if you can replicate it with a file you can share let us know. Memory seems like a good candidate, will see if there is a better error message we can give there.
Is the word
whisperkit-cli
missing from the README?If I don't include it, I get
error: no executable product named 'transcribe'
.Transcription seems to be pretty slow, with no use of the GPU.
The output is a wall of text, with some capitalisation anomalies.
Using the mlx whisper, you can add timestamps to the output, so that if two people are speaking, the transcript starts each change of speaker on a new line. Is the same capability available here?
I'm not sure what MP3 formats are supported? I got a
Error when transcribing /Users/xxx.mp3: loadAudioFailed("Unable to resample audio")
from a stereo 44.1 kHz .mp3 file.I'm not sure whether I'm using the large-v3 for 30s clips, or the one for full length transcripts.