adblockradio / stream-audio-fingerprint

Audio landmark fingerprinting as a Node Stream module
Mozilla Public License 2.0
768 stars 64 forks source link

Possible ways of reducing complexity #14

Open DonaldTsang opened 5 years ago

DonaldTsang commented 5 years ago
  1. According to https://en.wikipedia.org/wiki/Just-noticeable_difference humans can only notice tones of 10 cents (1/10 of a semitone) or more, so is it possible to reduce tone differences to an integer to the nearest 10cents instead of a float?
  2. According to https://www.reddit.com/r/askscience/comments/5dpu0z/what_is_the_fastest_beats_per_minute_we_can_hear/ humans can only notice beats of 1500-1800BPM or 25-30BPS or slower, so is it possible to reduce the timed difference to an integer to the nearest 40ms (or 50ms, or 20ms/25ms for higher accuracy) instead of a float?
  3. According to https://en.wikipedia.org/wiki/List_of_chords most chords have less than 8 notes, so would adding f3~f6 make everything more accurate?

P.S. It would be great to have an awesome-audio-fingerprint that shows a list of projects that uses this repo

dest4 commented 5 years ago

1) The purpose of acoustic fingerprinting is not necessarily to distinguish tracks that are different to human ears. It's to recognize a sound we already know, given the difficulties of numerical spectral estimation and potentially added noise. Note we use FFT here, so tones here translate in exponentially increasing values of Hz steps. Your point would better work in a context using e.g. Mel scale spectra.

2) and 3) I don't get your point.

Do not hesitate to play with the landmark parameters and report your findings here.

DonaldTsang commented 5 years ago

Sorry for not explaining things properly

  1. I am asking whether "reducing exact Hz into piano keys" would be a good tradeoff to save space and database search time. Also I just realized that standardizing piano keys to 440Hz will cause music that are tuned to 432Hz sound weaker, couple that with the fact that microtonal music exists, there has to be a way of rounding up/down tones to fit the chromatic (or microtonal scales if you want accuracy). If we are going the microtonal route, then the human-recognizable tonal difference of minimum 12-25 cents (a scale of 48-100 notes between each octave) would have to be decided upon.
  2. For this I am asking what are the "anchor points" (Shazam paper reference) of the software, and if it is possible to reduce the time distance between two notes into a single integer that is smaller than the expected <1 milliseconds (possibly use 20~50 millisecond intervals) to save space and database search time (credit to Adam Neely's "Fastest Music" video).
  3. For this I am asking what are the "anchor chords" of the software, and if it is necessary to add extra anchor points based on chords that has more than 3 notes (where st, f1 and f2 are not enough, requiring f3~f7) See https://en.wikipedia.org/wiki/List_of_chords
  4. (Extra question) What are the interval minimum and maximum of the "Target Zone" (Shazam paper reference), is it one octave? two octaves? (in wikipedia people have considered quadruple octaves as in "interval", so that is not really useful in real life)
  5. (Extra question) What are the time difference maximum of the "Target Zone" (Shazam paper reference), is it one second? two seconds? (as any music slower than two seconds are "too slow to be useful", a reference to Adam Neely) image
DonaldTsang commented 5 years ago

@dest4 as I have asked in https://github.com/worldveil/dejavu/issues/199 are these ways of redcuing complexity viable (sue to constraints in human hearing)?