ggerganov / kbd-audio

🎤⌨️ Acoustic keyboard eavesdropping
https://ggerganov.github.io/keytap
MIT License
8.55k stars 588 forks source link

[Idea] Compute key similarity over the log-scale Mel spectrogram #49

Open ggerganov opened 2 years ago

ggerganov commented 2 years ago

Currently, we compute the cross-correlation between time-domain key waveforms to determine how similar 2 keys are. Instead, we can compute the similarity metric over the Mel spectrograms of the signals. The Mel spectrogram seems to be the go-to choice for audio representation in modern state-of-the-art speech recognition algorithms, so why not give it a try in keytap.

Here is a sample implementation to compute the log-scaled Mel spectrogram of an audio, that I recently did for the whisper.cpp project:

https://github.com/ggerganov/whisper.cpp/blob/6d654d192a62e6cd9897d6ff683bdc97406827e9/main.cpp#L1962-L2063