argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
https://takeargmax.com/blog/whisperkit
MIT License
3.72k stars 311 forks source link

main branch may have a streaming issue #215

Open HelloSzymon opened 2 weeks ago

HelloSzymon commented 2 weeks ago

Hello,

I have i think weird problem. Code taken from ContentView of the SDK is not transcribing the voice. The contentView code was tested on iphone simulator/ iphone real device and apple vision pro simulator. On all same results so some random text:

Simulator Screenshot - Apple Vision Pro - 2024-10-04 at 00 06 42

I also tried with basic implementation which is provided in documentation and print(transcription) returns me nil. For now i do not know what and where i might find problem. Package depndencies is set to main branch. Audio file is set correctly.


import SwiftUI
import WhisperKit

struct ContentView: View {
    var body: some View {
        VStack {
            Image(systemName: "globe")
                .imageScale(.large)
                .foregroundStyle(.tint)
            Text("Hello, world!")
        }
        .padding()
        .onAppear{
            Task {
               let pipe = try? await WhisperKit()
                let path = Bundle.main.url(forResource: "MP3sample", withExtension: "mp3")!.absoluteString

               let transcription = try? await pipe!.transcribe(audioPath: path)?.text
                print(transcription)
            }

        }
    }
atiorh commented 1 week ago

@HelloSzymon