biemster / gasr

Google Chrome SODA Offline Speech Recognition command line client
https://hackaday.io/project/164399-android-offline-speech-recognition-natively-on-pc
150 stars 18 forks source link

Transcript Speaker out or System sound (Windows: Stereo Mix) #20

Closed ewwink closed 9 months ago

ewwink commented 9 months ago

can we configure libsoda/soda.dll to capture audio from Speaker out or system sound without external app?

image

FaintWhisper commented 9 months ago

This will involve configuring the current output device as a capture device through WASAPI and routing the audio through SODA. This can be easily implemented using the Windows API [1].

However, I have achieved the equivalent in a more simple manner. My current setup uses Virtual Audio Cable [1] in conjunction with SOX to redirect the output audio from the system to SODA. However, this is for Linux, I have not been able to do this on Windows either with SOX or FFmpeg but it is definitely possible.

Additionally, if you wish to listen to the audio while using the Virtual Audio device as I do, within the Windows sound panel under Recording, you can navigate to the device properties and select "Listen to this device" to set up a redirection of the audio to your speakers or headphones of your choice.

[1] https://learn.microsoft.com/es-es/windows/win32/coreaudio/loopback-recording [2] https://vac.muzychenko.net/en/download.htm

ewwink commented 9 months ago

I will try, thank you

FaintWhisper commented 9 months ago

I forgot to say that I am using Windows with Virtual Audio Cable installed, but a patched version of SODA (1.1.1.7 version) and SOX are running inside Windows Subsystem for Linux instead of directly on Windows. So, given that you are also using Windows, you can also adopt the setup I mentioned in my previous comment. You will only need to patch the Linux version of SODA.

Thanks to the incorporation of WSLg of a PulseAudio server, there is a sink device in WSL that you can use to feed the system output audio of Windows. This sink is called RDPSource (you can list the sinks and sources using pactl list sources). This is the SOX command I am using, in case it is useful to you:

sox -t pulseaudio RDPSource --buffer 64 --input-buffer 64 -tf32 -L -r16000 -c1 -ts16 -q - | python3 gasr.py