Macoron / whisper.unity

Running speech to text model (whisper.cpp) in Unity3d on your local machine.
MIT License
389 stars 84 forks source link

Is it possible for in-game audio sources to be omitted from whisper? #62

Open yosun opened 9 months ago

yosun commented 9 months ago

Is it possible for in-game audio sources to be omitted from whisper?

Macoron commented 9 months ago

If I understand your question correctly, for example when a player's speakers are quite loud and potentially being picked up by the microphone, then the answer is no.

Whisper operates directly with audio input from the microphone or an audio clip, and it does not include any feature for filtering out these types of background noises or sounds emitted from the in-game audio. The only suggestion I can offer is to fine-tune the Voice Activity Detection (VAD) settings to ensure that it does not react to in-game sounds captured by the microphone

yosun commented 9 months ago

what if there is speaking audio from the game (NPCs) ... that you don't want whisper to pick up?

Message ID: @.***>

Macoron commented 9 months ago

what if there is speaking audio from the game (NPCs) ... that you don't want whisper to pick up? Message ID: @.***>

The simplest way is to use headphones. You can also do some custom push-to-talk button, which turn off NPC voices while player is speaking. In worst case you would need to implement some custom audio filters.

Whisper.cpp also supports speech segmentation (https://github.com/ggerganov/whisper.cpp/pull/1058), but this is probably an overkill and it doesn't supported by Unity bindings yet.

yosun commented 9 months ago

Something like vocalremover.org but given the input from the in-game audio, be able to remove/extract from microphone (?)

Macoron commented 9 months ago

Something like vocalremover.org but given the input from the in-game audio, be able to remove/extract from microphone (?)

To be honest, I don't know. There is no build-in solution in whisper.cpp that can do something like that.

Tyrannicus100BC commented 8 months ago

The technical term is "echo cancellation" and is mostly used in audio chat apps. The concept is to keep track of what audio was played out of the computer speakers in the past and analyze incoming microphone input to identify that previous audio. It then uses acoustical analysis to remove the previous audio from the captured microphone input before feeding into whisper.

I did some looking around recently and couldn't find any good echo cancellation libraries for Unity :-/