argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
https://takeargmax.com/blog/whisperkit
MIT License
3.17k stars 267 forks source link

`detectLanguage` isn't working #168

Open stasbel opened 3 months ago

stasbel commented 3 months ago

version: 0.7.2 snippet to reproduce:

import Foundation
import WhisperKit

Task {
  guard let desktopURL = FileManager.default.urls(
    for: .desktopDirectory,
    in: .userDomainMask
  ).first else { return }
  print("desktopURL: \(desktopURL)")

  let model = try await WhisperKit(model: "base")
  try await model.loadModels(prewarmMode: true)

  let fileURL = desktopURL.appendingPathComponent("0.wav")
  print("absoluteString: \(fileURL.path)")
  let (lang, langProbs) = try await model.detectLanguage(audioPath: fileURL.path)
  print("lang: \(lang), langProbs: \(langProbs)")
}

file: 0.wav.zip

tiny version gives:

lang: en, langProbs: ["en": -0.35910118]"

which is not all languages and has negative prob

base version gives:

[WhisperKit] Detected language nocaptions is not supported, defaulting to en
lang: en, langProbs: [:]

which is just plain error somewhere inside whisperkit

atiorh commented 3 months ago

Thanks for the report!

stasbel commented 3 months ago

thanks what about predicting multiple languages and corresponding probs? why "no speech is detected"? audio example clearly has some :)

stasbel commented 2 months ago

@ZachNagengast hi, is it on track to be completed anytime soon?

ZachNagengast commented 2 months ago

@stasbel This looks like it was actually a tiny oversight in the filter logic and not related to the sampling issues we just put out a fix for. Please check main with commit https://github.com/argmaxinc/WhisperKit/commit/c93d613c8c6fc3ec5b9d1d9da5d1f4206183c5e4 for the fix.

ZachNagengast commented 1 month ago

@stasbel Checking on this, is your issue resolved with the latest change?