JorenSix / Panako

The Panako acoustic fingerprinting system.
GNU Affero General Public License v3.0
185 stars 38 forks source link

PanakoStrategy query logic Line 280 - printMap overwrites frequency and time values? #34

Open lucaslawes opened 2 years ago

lucaslawes commented 2 years ago

Possible minor refactoring to improve the recognition rate.

Issue Initial testing against a set of audio tracks shows a fingerprint pattern (hash, f1, t1) will sometimes be repeated, but more often the hash is repeated and the f1/t1 is different.

In the current application logic, the use of a HashMap for the printMap means the f1/t1 information is sometimes lost resulting in a slightly less accurate recognition.

//query
for(PanakoFingerprint print : prints) {
    long hash = print.hash();
    db.addToQueryQueue(hash);
    printMap.put(hash, print);
}
...
hit.queryTime = printMap.get(fingerprintHash).t1;
hit.queryF1 = printMap.get(fingerprintHash).f1;

Suggestion Pass the entire fingerprint to the db queue, extend the PanakoHit class to support queryTime and queryF1, set them when processing the db queue and do anyway with the printMap.

//query
for(PanakoFingerprint print : prints) {
    db.addToQueryQueue(print);
}
...
hit.queryTime = dbHit.queryT1;
hit.queryF1 = dbHit.queryF1;
JorenSix commented 2 years ago

Hi thanks for the suggestion,

The reason for not allowing duplicate hashes is twofold (and is reflected at the storage side, it is essentially the same as #37):

If a hash is common it means (almost by definition) that it does not have much discriminative power. The idea implemented here is that they can be safely ignored.

Another reason is performance: not wasting storage space or computation on hashes with little discriminative power. While some hash collisions are allowed having too many could have an effect on query performance.

However, letting users choose would indeed be a good improvement. For small collections or powerful servers the collisions can perhaps be not that big of a problem. Either using a Set (to avoid duplicates) or an Array (to allow) to store temporary prints could be an idea indeed.