argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
http://argmaxinc.com/blog/whisperkit
MIT License
3.92k stars 330 forks source link

Prompt string being returned as transcription result #162

Closed nchudleigh closed 2 months ago

nchudleigh commented 5 months ago
let promptString = "This will end up being returned as transcriptionResult"

let transcriptionResult = try await whisperKit.transcribe(
    audioArray: samples,
    decodeOptions: DecodingOptions(
        language: modeManager.activeMode.language,
        skipSpecialTokens: true,
        promptTokens: whisperKit.tokenizer.encode(text: promptString)
    )
)

print(promptString == transcriptionResult) // true
ZachNagengast commented 4 months ago

This is happening because the current promptTokens will have special tokens in them without filtering. I'm adding a fix that will filter it by default via whisperKit.tokenizer.encode(text: promptString).filter { $0 < tokenizer.specialTokens.specialTokenBegin }

Thanks for the report, this will be in the next release shortly.

ZachNagengast commented 2 months ago

Should be resolved with #183