dpwe / audfprint

Landmark-based audio fingerprinting
MIT License
547 stars 123 forks source link

Shorter query question #25

Closed fengliu2014 closed 7 years ago

fengliu2014 commented 7 years ago

I was doing some test with the code. The only difference is that now the query is 1s. To make it work, I increase the landmark density and change the target zone to a small one(63bins and 15 symbols). Even without noise and degradation, the recognition rate is just 92%. Is this expected? As the code is designed mainly for query 5sec at least?

dpwe commented 7 years ago

I think it takes on the order of seconds for the adaptive threshold to settle. Perhaps a more sophisticated initialization scheme would give better results on short examples. What happens if you repeat the short stimuli 2 or 3 times?

DAn.

On Sat, Jan 21, 2017 at 20:25 Feng Liu notifications@github.com wrote:

I was doing some test with the code.

The only difference is that now the query is 1s. To make it work, I increase the landmark density and change the target zone to a small one(63bins and 15 symbols). Even without noise and degradation, the recognition rate is just 80%. Is this expected? As the code is designed mainly for query 5sec at least?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dpwe/audfprint/issues/25, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhs0Wid038WiPKQgozyjRnIR3tvVP30ks5rUq_4gaJpZM4LqPSd .

fengliu2014 commented 7 years ago

Thank you DAn. After repeat the stimuli, the recognition rate is better, 95%, but still not 100%.

By the way, does this algorithm only work when the training dataset contains the exact query audio? I was trying to build a audio fingerprinting system for speech. I have a dataset which consists of a lot of short speech audio(about 2s). These audio files are generated by the same person. Then he speaks again and use this as query audio. In the training dataset, there are several files (>=10) that have same content as the query audio. 'Same content' means these audio files contain same words. I am hopping to find one of them, but the performance is poor, about 20%.

dpwe commented 7 years ago

By the way, does this algorithm only work when the training dataset contains the exact query audio? I was trying to build a audio fingerprinting system for speech. I have a dataset which consists of a lot of short speech audio(about 2s). These audio files are generated by the same person. Then he speaks again and use this as query audio. In the training dataset, there are several files (>=10) that have same content as the query audio. 'Same content' means these audio files contain same words. I am hoping to find one of them, but the performance is poor, about 20%.

Audio fingerprinting only matches exact instances (recordings) of waveforms. Even when the same person repeats a word, there are slight differences in the frequencies and timing which alter the landmarks upon which the fingerprinting relies.