argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
http://argmaxinc.com/blog/whisperkit
MIT License
3.92k stars 331 forks source link

Add audio device selector to transcribe + take a stab at Delete/Retry models #54

Closed cgfarmer4 closed 8 months ago

cgfarmer4 commented 8 months ago

Addresses #13 + adds audio device view in both sections.

  1. New delete model button

    Screenshot 2024-03-06 at 9 32 03 PM
  2. Restart download of model if connection lost

    Screenshot 2024-03-06 at 9 49 04 PM
  3. Move audio input selection between the buttons on transcribe.

    Screenshot 2024-03-06 at 9 49 17 PM
ZachNagengast commented 8 months ago

Nice idea with the delete and retry! This is probably a good time to handle canceling the previous download task in resetState() because it's a static method. As-is, if you are in the middle of a download and tap the restart button, the previous download will continue and a new one will start. It may require something like this:

        let downloadTask = Task {
            folder = try await WhisperKit.download(variant: model, from: repoName, progressCallback: { progress in
                DispatchQueue.main.async {
                    loadingProgressValue = Float(progress.fractionCompleted) * specializationProgressRatio
                    modelState = .downloading
                }

                // Check for cancellation here
                try Task.checkCancellation()
            })
        }

or something similar to how the progressbar task gets cancelled:


                let progressBarTask = Task {
                    await updateProgressBar(targetProgress: 0.9, maxTime: 240)
                }

                // Prewarm models
                do {
                    try await whisperKit.prewarmModels()
                    progressBarTask.cancel()
                } catch {
                    print("Error prewarming models, retrying: \(error.localizedDescription)")
                    progressBarTask.cancel()
                    if !redownload {
                        loadModel(model, redownload: true)
                        return
                    } else {
                        // Redownloading failed, error out
                        modelState = .unloaded
                        return
                    }
                }

What do you think? This would also solve a few edge cases with the model selection, like if you select a new model while the previous one is downloading.

ZachNagengast commented 8 months ago

Also played around with the mic selector placement, what do you think of this setup? Added a PR to your repo: https://github.com/cgfarmer4/WhisperKit/pull/1

image image
cgfarmer4 commented 8 months ago

Beautiful, merged your PR and will take a look at cancelling the download tonight.

cgfarmer4 commented 8 months ago

I removed the retry, getting downloads to cancel is going to require a larger refactor of this function.

let downloadTask = Task {
            folder = try await WhisperKit.download(variant: model, from: repoName, progressCallback: { progress in
                DispatchQueue.main.async {
                    loadingProgressValue = Float(progress.fractionCompleted) * specializationProgressRatio
                    modelState = .downloading
                }

                // Check for cancellation here
                try Task.checkCancellation()
            })
        }

Going for this path, you get an error that Mutation of captured var 'folder' in concurrently-executing code.

Then I tried this which fixes the warning: Mutation of captured var 'folder' in concurrently-executing code; this is an error in Swift 6 but doesnt properly cancel the download.

DispatchQueue.main.async {
    folder = downloadFolder
}

Next approach I tried was assigning the main Task similar to how transcriptionTask is assigned but that requires another refactor around the do/catch statements.

How about I leave in the delete + your update and call it on this one? The download cancellation needs a rethink or Im totally missing something.

ZachNagengast commented 8 months ago

No problem! We can get to that later on, but this is a good addition as-is 👍