dpwe / audfprint

Landmark-based audio fingerprinting
MIT License
541 stars 121 forks source link

Any way to force a minimum # peaks/second? #10

Open bengrosser opened 9 years ago

bengrosser commented 9 years ago

I've been using audfprint for doing peak spectral peak analysis in order to understand how/what Shazam et al see as "peaks" within a given track. So I've mostly been using the precompute and -K options to produce peak files and then extract the location/frequency pairs from the afpk file to use elsewhere.

I'm noticing that, especially at relatively low densities (say, 10 hashes/sec), I can end up with no peaks detected for long stretches. For example, on a test track I'm using I'll end up without any peaks in a stretch of 4 seconds or more. This is for sonic material with perceptible activity.

So I have two questions about this.

First, is there any way to force a higher sensitivity without requiring an overall higher density? In other words, in the areas with lots of peaks (at a density of 10 hashes/sec), I don't need any more in the dense areas, but I would appreciate more peak detection in the lulls. Can I ask for a minimum hashes/sec?

Second, does Shazam et al ever left that long of a stretch go w/o recorded peaks? My intuition playing with it is no, but I was surprised to encounter such low peak detection.

dpwe commented 9 years ago

First, is there any way to force a higher sensitivity without requiring an overall higher density? In other words, in the areas with lots of peaks (at a density of 10 hashes/sec), I don't need any more in the dense areas, but I would appreciate more peak detection in the lulls. Can I ask for a minimum hashes/sec?

The way peak-picking works is to have a decaying threshold; only when peaks poke through the threshold are they considered as landmarks (at which point, the local threshold is raised to the level of this new peak). "Density" simply modulates how quickly this threshold decays, and hence how soon before another peak pokes through.

However, quiet stretches following loud stretches will, in general, receive few landmarks. Turning up density will shorten this "dead time". 10 hashes/sec is a low density, so you can't expect to have anything like a comprehensive collection of peaks at that level.

Note also --pks-per-frame; at each time step, only the first few new landmarks are recorded. The default value for this (5) typically discards lots of landmarks. If you're trying to find something closer to all the candidate landmarks, you'll want to increase this.

Finally, this is not the Shazam algorithm. It's my own implementation that is informed by Avery Wang's paper describing the Shazam technology, but in particular details such as exactly how peaks are formed from the spectrogram, and how peak density is controlled with --density and --pks-per-frame etc., will only resemble what Shazam does by pure coincidence.

DAn.

Second, does Shazam et al ever left that long of a stretch go w/o recorded peaks? My intuition playing with it is no, but I was surprised to encounter such low peak detection.

— Reply to this email directly or view it on GitHub.