argmaxinc / WhisperKit

On-device Speech Recognition for Apple Silicon
https://takeargmax.com/blog/whisperkit
MIT License
3.17k stars 267 forks source link

Audio input captures only the first channel #134

Open iGerman00 opened 4 months ago

iGerman00 commented 4 months ago

Great library and equally great demo app! I tried using the demo, and it works quite well with the built-in mic of my MacBook, but it seems like currently the microphone input only captures the first channel. This is fine in 90% of use cases, but my audio interface presents itself as a stereo input to the system, with the left channel being one physical input, and the right being another, and my mic is plugged into the right (so floatChannelData[1] instead of [0]), rendering the app unusable because there is no way to switch/combine those inputs using macOS or the app.

I suggest summing all inputs of the device, or at least give the user the option to choose what input channel to use. Seems like this issue lies in this line, so it's outside of the actual demo app and in the library itself, though I'm not certain.

Would really appreciate a quick fix and push to TestFlight, I'm not trying to set up Xcode and compile the app myself at the moment - don't even have the disk space for that :P

ZachNagengast commented 4 months ago

Thanks for raising this, I've heard some similar issues with input channels. I'm aware of some ways to combine the channels from stereo to mono with a mixer node, but will need to investigate. In the meantime the simplest solution would be to remap your mic input with an app like blackhole or soundflower - the testflight app was purely intended as an example for how to use the library, though I'm glad to hear some folks are getting use out of it!

amirvenus commented 1 month ago

Great library and equally great demo app! I tried using the demo, and it works quite well with the built-in mic of my MacBook, but it seems like currently the microphone input only captures the first channel. This is fine in 90% of use cases, but my audio interface presents itself as a stereo input to the system, with the left channel being one physical input, and the right being another, and my mic is plugged into the right (so floatChannelData[1] instead of [0]), rendering the app unusable because there is no way to switch/combine those inputs using macOS or the app.

I suggest summing all inputs of the device, or at least give the user the option to choose what input channel to use. Seems like this issue lies in this line, so it's outside of the actual demo app and in the library itself, though I'm not certain.

Would really appreciate a quick fix and push to TestFlight, I'm not trying to set up Xcode and compile the app myself at the moment - don't even have the disk space for that :P

Did you have any luck with setting the audio input channel?

ZachNagengast commented 1 month ago

Interested in alternative options to audioengine for audio input, if anyone has ideas for other options please post in this thread