dpwe / audfprint

Landmark-based audio fingerprinting
MIT License
538 stars 121 forks source link

Matching fails with ValueError: The first argument of bincount must be non-negative #21

Closed dpwe closed 7 years ago

dpwe commented 7 years ago

Dan Schultz reports:

I wanted to let you know I'm seeing an error from audfprint:

Traceback (most recent call last): File "/usr/local/bin/audfprint/audfprint.py", line 482, in main(sys.argv) File "/usr/local/bin/audfprint/audfprint.py", line 465, in main strip_prefix=args['--wavdir']) File "/usr/local/bin/audfprint/audfprint.py", line 155, in do_cmd msgs = matcher.file_match_to_msgs(analyzer, hash_tab, filename, num) File "/usr/local/bin/audfprint/audfprint_match.py", line 326, in file_match_to_msgs rslts, dur, nhash = self.match_file(analyzer, ht, qry, number) File "/usr/local/bin/audfprint/audfprint_match.py", line 317, in match_file rslts = self.match_hashes(ht, q_hashes) File "/usr/local/bin/audfprint/audfprint_match.py", line 272, in match_hashes results = self._approx_match_counts(hits, bestids, rawcounts) File "/usr/local/bin/audfprint/audfprint_match.py", line 228, in _approx_match_counts allbincounts = np.bincount((allids << timebits) + alltimes) ValueError: The first argument of bincount must be non-negative

dpwe commented 7 years ago

The problem was in the way _approx_match_counts uses np.bincount to quickly count the most popular combinations of IDs and time offsets (to identify the reference files with the best match). It combines them into a single int by shifting the IDs up by enough bits to make them clear of all the time offsets. In this case, the input file was very long (7 hours), to to be clear of all the time offsets, timebits was 21. Then allids included numbers up to 2973 (i.e., 11 bits wide). Shifting these values up by 21 bits caused bit 32 to be set; because the values had been taken out of the big hash table, they were 32 bit ints, so when the top bit was set, they flipped to become negative.

Converting allids and alltimes to default (64 bit) ints seems to fix the problem.

dpwe commented 7 years ago

closed via a076c60