argmaxinc / WhisperKit

On-device Inference of Whisper Speech Recognition Models for Apple Silicon
https://takeargmax.com/blog/whisperkit
MIT License
2.92k stars 246 forks source link

Error calling whisperKit.prewarmModels() in iOS app #171

Open reubenab opened 2 weeks ago

reubenab commented 2 weeks ago

I have the following function defined in my Swift iOS app, copied largely from the example app in the repo:

func loadModel(isRedownloadAttempt: Bool) {
        // First check what's already downloaded
        if let documents = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first {
            print("[TranscriptionViewModel] Checking downloaded documents")
            let modelPath = documents.appendingPathComponent(modelStorage).path

            // Check if the directory exists
            if FileManager.default.fileExists(atPath: modelPath) {
                localModelPath = modelPath
                do {
                    let downloadedModels = try FileManager.default.contentsOfDirectory(atPath: modelPath)
                    for model in downloadedModels where !localModels.contains(model) {
                        localModels.append(model)
                    }
                } catch {
                    print("[TranscriptionViewModel] Error enumerating files at \(modelPath): \(error.localizedDescription)")
                }
            }
        }
        localModels = WhisperKit.formatModelFiles(localModels)
        print("[TranscriptionViewModel] Found locally: \(localModels)")
        print("[TranscriptionViewModel] Local model path: \(localModelPath)")

        DispatchQueue.main.async {
            self.whisperKit = nil
        }
        Task {
            let whisperKitInstance = try await WhisperKit(
                verbose: true,
                logLevel: .error,
                prewarm: false,
                load: false,
                download: false
            )
            await MainActor.run {
                self.whisperKit = whisperKitInstance
            }
            guard let whisperKit = whisperKit else {
                return
            }

            var folder: URL?

            // Check if the model is available locally
            if localModels.contains(WhisperKit.recommendedModels().default) && !isRedownloadAttempt {
                // Get local model folder URL from localModels
                print("[TranscriptionViewModel] Retrieving stored transcription model")
                folder = URL(fileURLWithPath: localModelPath).appendingPathComponent(WhisperKit.recommendedModels().default)
            } else {
                // Download the model
                print("[TranscriptionViewModel] Downloading transcription model")
                folder = try await WhisperKit.download(variant: WhisperKit.recommendedModels().default, from: repoName, progressCallback: { progress in
                    DispatchQueue.main.async {
                        self.loadingProgressValue = Float(progress.fractionCompleted) * self.specializationProgressRatio
                        self.modelState = .downloading
                    }
                })
            }

            await MainActor.run {
                self.loadingProgressValue = self.specializationProgressRatio
                modelState = .downloaded
            }

            if let modelFolder = folder {
                whisperKit.modelFolder = modelFolder

                await MainActor.run {
                    self.loadingProgressValue = self.specializationProgressRatio
                    modelState = .prewarming
                }

                // Set the loading progress to 90% of the way after prewarm
                let progressBarTask = Task {
                    await updateProgressBar(targetProgress: 0.9, maxTime: 240)
                }

                // Prewarm models
                do {
                    print("[TranscriptionViewModel] Prewarming transcription model")
                    try await whisperKit.prewarmModels()
                    progressBarTask.cancel()
                } catch {
                    print("[TranscriptionViewModel] Error prewarming models, retrying: \(error.localizedDescription)")
                    progressBarTask.cancel()
                    if !isRedownloadAttempt {
                        loadModel(isRedownloadAttempt: true)
                        return
                    } else {
                        // Redownloading failed, error out
                        await MainActor.run {
                            self.modelState = .unloaded
                        }
                        // TODO: ADD ERROR TOAST
                        return
                    }
                }

                await MainActor.run {
                    // Set the loading progress to 90% of the way after prewarm
                    loadingProgressValue = specializationProgressRatio + 0.9 * (1 - specializationProgressRatio)
                    modelState = .loading
                }

                print("[TranscriptionViewModel] Loading transcription model")
                try await whisperKit.loadModels()

                await MainActor.run {
                    if !localModels.contains(WhisperKit.recommendedModels().default) {
                        localModels.append(WhisperKit.recommendedModels().default)
                    }

                    print("[TranscriptionViewModel] Finished loading transcription model")
                    loadingProgressValue = 1.0
                    modelState = whisperKit.modelState
                }
            }
        }
    }

A few users of the app reached out saying the transcription model was getting stuck loading. I was able to reproduce the below error on the iPhone SE (3rd generation) Simulator (running iOS 17.4). It looks like the modelSupport function doesn't have an explicit check for the iPhone SE, but it should fall back to openai_whisper-base (which is fine).

In my logging, I found the following statement (which comes from the catch statement from try await whisperKit.prewarmModels()): "[TranscriptionViewModel] Error prewarming models, retrying: Unable to load model: file:///Users/reubenabraham/Library/Developer/CoreSimulator/Devices/02AD2DD3-5F29-443A-881D-13F2FDE47BC7/data/Containers/Data/Application/787AB037-00B7-49C8-9503-0EAB34972520/Documents/huggingface/models/argmaxinc/whisperkit-coreml/openai_whisper-base/MelSpectrogram.mlmodelc/. Compile the model with Xcode or MLModel.compileModel(at:). "

Just before that, there was a longer warning message:

/Users/reubenabraham/Library/Developer/CoreSimulator/Devices/02AD2DD3-5F29-443A-881D-13F2FDE47BC7/data/Containers/Data/Application/787AB037-00B7-49C8-9503-0EAB34972520/Documents/huggingface/models/argmaxinc/whisperkit-coreml/openai_whisper-base/MelSpectrogram.mlmodelc/coremldata.bin is not a valid .mlmodelc file because the first word (1329865020) is not recognizable.

MLModelAsset: load failed with error Error Domain=com.apple.CoreML Code=0 "Unable to load model: file:///Users/reubenabraham/Library/Developer/CoreSimulator/Devices/02AD2DD3-5F29-443A-881D-13F2FDE47BC7/data/Containers/Data/Application/787AB037-00B7-49C8-9503-0EAB34972520/Documents/huggingface/models/argmaxinc/whisperkit-coreml/openai_whisper-base/MelSpectrogram.mlmodelc/. Compile the model with Xcode or `MLModel.compileModel(at:)`. " UserInfo={NSLocalizedDescription=Unable to load model: file:///Users/reubenabraham/Library/Developer/CoreSimulator/Devices/02AD2DD3-5F29-443A-881D-13F2FDE47BC7/data/Containers/Data/Application/787AB037-00B7-49C8-9503-0EAB34972520/Documents/huggingface/models/argmaxinc/whisperkit-coreml/openai_whisper-base/MelSpectrogram.mlmodelc/. Compile the model with Xcode or `MLModel.compileModel(at:)`. , NSUnderlyingError=0x600000c71c80 {Error Domain=com.apple.CoreML Code=3 "/Users/reubenabraham/Library/Developer/CoreSimulator/Devices/02AD2DD3-5F29-443A-881D-13F2FDE47BC7/data/Containers/Data/Application/787AB037-00B7-49C8-9503-0EAB34972520/Documents/huggingface/models/argmaxinc/whisperkit-coreml/openai_whisper-base/MelSpectrogram.mlmodelc/coremldata.bin is not a valid .mlmodelc file because the first word (1329865020) is not recognizable. : unspecified iostream_category error" UserInfo={NSLocalizedDescription=/Users/reubenabraham/Library/Developer/CoreSimulator/Devices/02AD2DD3-5F29-443A-881D-13F2FDE47BC7/data/Containers/Data/Application/787AB037-00B7-49C8-9503-0EAB34972520/Documents/huggingface/models/argmaxinc/whisperkit-coreml/openai_whisper-base/MelSpectrogram.mlmodelc/coremldata.bin is not a valid .mlmodelc file because the first word (1329865020) is not recognizable. : unspecified iostream_category error}}}

I'm not sure what to make of this error message or how to proceed further and fix this problem. What am I missing? Everything works fine for most of my users but about 20% have run into this problem (or something similar during the loading process).

reubenab commented 2 weeks ago

Additionally, when looking at the files at the filepath, I see a MelSpectrogram.mlmodelc file indeed exists, but it must be corrupted in some way.

ZachNagengast commented 2 weeks ago

This looks like a partial download, I think a fix could attempt to redownload this particular file if it fails to load. Thanks for flagging.

reubenab commented 2 weeks ago

@ZachNagengast thank you for the response! just to make sure I understand, this is a fix you're looking at bringing into the core library? would anything need to change in the above codeblock?

ZachNagengast commented 3 days ago

Essentially, the fix here is checking for any throw when loading the model, and if that fails, attempt to re-download it. This can also fixed if using the useBackgroundSession: true optional init param.