JorenSix / Panako

The Panako acoustic fingerprinting system.
GNU Affero General Public License v3.0
185 stars 38 forks source link

PanakoStrategy Query logic - mostCommonDeltaTforHitList selects most common delta time by position in HashMap #35

Open lucaslawes opened 2 years ago

lucaslawes commented 2 years ago

Possible minor refactoring to improve the recognition rate.

Observation A test against a set of audio tracks found that the PanakoStrategy.mostCommonDeltaTforHitList method will return the most common delta dependent on the number of occurrences. However, when more than one delta time has the same number of occurrences, the delta time returned is dependent on where it is stored in the HashMap, rather than a mathematical decision.

For example, for delta times x, y and z: x with 1 occurrence y with 2 occurrences z with 3 occurrences

z is returned, but if x, y, and z delta times all have the same number of occurrences, which one is returned is dependent on chance - their position in the HashMap.

Suggestion I don't know what mathematical decision is most suitable, but I tried taking the lowest delta time and this improved the recognition rate of the query. This is my quickly written alternative implementation of mostCommonDeltaTforHitList:

var timeDeltas = new ArrayList<TimeDeltaInfo>();

for(var queryMatch : queryMatches) {

  var foundTimeDelta = false;
  for(var timeDelta : timeDeltas) {
      if(timeDelta.timeDelta == queryMatch.getTimeDelta()) {
          timeDelta.count ++;
          foundTimeDelta = true;
          break;
      }
  }
  if(!foundTimeDelta) {
      timeDeltas.add(new TimeDeltaInfo(queryMatch.getTimeDelta()));
  }
}

// Get the time deltas with the highest number of occurrences at the top
timeDeltas.sort((a, b) -> b.count - a.count);

// Don't know what to do if we have no time deltas, so just return zero for now
if(timeDeltas.size() == 0) {
  return 0;
}

var topTimeDelta = timeDeltas.get(0);
var topTimeDeltas = new ArrayList<TimeDeltaInfo>();
topTimeDeltas.add(topTimeDelta);

for(var i = 1; i < timeDeltas.size(); i++) {
  if(timeDeltas.get(i).count == topTimeDelta.count) {
      topTimeDeltas.add(timeDeltas.get(i));
  }
}

// Get the smallest time delta as it seems to work better than taking the average or the largest time delta.
topTimeDeltas.sort((a, b) -> a.timeDelta - b.timeDelta);

return topTimeDeltas.get(0).timeDelta;  
JorenSix commented 2 years ago

Hi thanks for the suggestion,

For a 'real hit' I would expect one time delta to be much more common than any other time delta. So much so that the problem of having the same amount of time deltas should be relatively uncommon (but I did not test this thoroughly).

In any case, the the current (relatively random, based on hashmap position) behaviour for when the same amount of occurrences are present is indeed not ideal.

The smallest time delta might be a reasonable heuristic I am considering if there are others which might be reasonable.