argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
https://takeargmax.com/blog/whisperkit
MIT License
3.17k stars 268 forks source link

Implement selecting input device #51

Closed cgfarmer4 closed 6 months ago

cgfarmer4 commented 6 months ago

Allow for selection of Core Audio device to be used with AVAudioEngine.

Please note this will only work on macOS. Not sure theres a cleaner way to do this besides the #if os(macOS).

https://streamable.com/vv0zjb

cgfarmer4 commented 6 months ago

Seeing a test fail but looks unrelated to my changes?

Test Suite 'All tests' started at 2024-03-05 20:21:19.203.
Test Suite 'whisperkitPackageTests.xctest' started at 2024-03-05 20:21:19.205.
Test Suite 'FunctionalTests' started at 2024-03-05 20:21:19.205.
Test Case '-[WhisperKitTests.FunctionalTests testAsyncImplementation]' started.
WhisperKit/WhisperKit.swift:192: Fatal error: Unexpectedly found nil while unwrapping an Optional value
cgfarmer4 commented 6 months ago

Awesome, thanks for the feedback. Ill work through this over the next couple days!

cgfarmer4 commented 6 months ago

@ZachNagengast cleaned up the forks. Not that experienced in multi-platform so thanks for the suggestion. Less forks, cleaner code 💪🏻. Controls look like this now. Tried to figure out a way to do it without throwing off the balance and this seems to work ok. Happy to continue to iterate if you have more feedback.

image

Lastly, when I first plugged in my audio interface on app start, I got a new exception from CoreAudio. required condition is false: format.sampleRate == hwFormat.sampleRate, rebooting my machine resolved this but not sure how much we care to guard against this here. Found some potential paths here but they dont seem aligned with the need for 16k output. Thoughts?

ZachNagengast commented 6 months ago

Regarding the error, we do need to iron out requesting 16khz sample rate for the devices that support it, but that should be in it's own PR. We have a converter that will deal with anything not in 16khz but it adds a small & avoidable latency.

cgfarmer4 commented 6 months ago

Nice! I can add it between the Transcribe buttons in a follow up PR, np.