dpwe / audfprint

Landmark-based audio fingerprinting
MIT License
536 stars 121 forks source link

Precomputing makes the accuracy go worse! #50

Closed 2407adi closed 5 years ago

2407adi commented 5 years ago

Hey Dan, hope you are doing great.

So I pre-computed some 'recordings' on which I wanted to query a subclip which is contained in all of the said 'recordings'. In parallel, I also saved the 'recordings' as a single .pkl database and queried the same subclip on it. Turns out, the first method fails to recognize the subclip in many of the recordings whereas the second method works flawlessly. Attached below is just one such instance:-

Results by 1st method: NOMATCH precomp/home/ubuntu/mm/audfprint-master/tests/data/ABC001 2018-09-08 16-00-00.afpt 3655.4 sec 377315 raw hashes

Results by 2nd method: Matched 46.6 s starting at 935.6 s in ./tests/data/ABC001 2018-09-08 16-00-00.mp3 to time 2.0 s in ./tests/data/adi/clip.mp3 with 1132 of 35696 common hashes at rank 0 count 8

Hope I make the issue clear enough, Thanks!

dpwe commented 5 years ago

I don't quite understand the difference between the examples, but I have a guess. It might be the asymmetric treatment of reference and query items: query items are, by default, analyzed at four different offsets within a single frame, whereas reference items are analyzed at just one offset (for speed and to minimize size). If you explicitly specify --shifts 4, it will make this treatment symmetric.

DAn.

On Mon, Dec 3, 2018 at 6:23 AM 2407adi notifications@github.com wrote:

Hey Dan, hope you are doing great..

So I pre-computed some 'recordings' on which I wanted to query a subclip which is contained in all of the said 'recordings'. In parallel, I also saved the 'recordings' as a single .pkl database and queried the same subclip on it. Turns out, the first method fails to recognize the subclip in many of the recordings whereas the second method works flawlessly. Attached below is just one such instance:-

Results by 1st method: NOMATCH precomp/home/ubuntu/mm/audfprint-master/tests/data/RAD001 2018-09-08 16-00-00.afpt 3655.4 sec 377315 raw hashes

Results by 2nd method: Matched 46.6 s starting at 935.6 s in ./tests/data/RAD001 2018-09-08 16-00-00.mp3 to time 2.0 s in ./tests/data/adi/clip.mp3 with 1132 of 35696 common hashes at rank 0 count 8

Hope I make the issue clear enough, Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dpwe/audfprint/issues/50, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhs0c21rtoe8nkg_mk8AoWRwCkqWq4Sks5u1QnDgaJpZM4Y-Yjz .

2407adi commented 5 years ago

Hey Dan, sorry for being vague.. I tried to play around with the shifts parameter as you suggested but unfortunately it didn't solve the issue, so here goes my second try.

Here, I'll refer to the use case wherein we are looking to search ads in different recordings, as mentioned in "searching_for_ads.md"

So, in the first case The reference database was made out of the ad and the landmark features of all the recordings are saved as .afpt files, I then go on to match the precomputed landmarks against the reference database. Here's the code for the same: python3 audfprint.py match --dbase ad.db --min-count 100 --max-matches 100 --sortbytime --opfile precomp_test.csv --ncores 4 --find-time-range --exact-count --maxtimebits 18 --list precomp.list

whereas, in the second case, I pickle the recordings as the reference database and match the ad.mp3 with the reference database to get the results. Code: python3 audfprint.py match --dbase all_recordings.pklz --density 200 --hashbits 20 --find-time-range --exact-count --max-matches 100 --min-count 100 --ncores 4 --maxtimebits 18 --opfile result.csv /home/ubuntu/mm/audfprint-master/tests/data/ad.mp3

Now in the above example the 2nd method works great but the 1st one fails to identify the ad in many of the recordings, one such instance is stated below,

Result by 1st method : NOMATCH precomp/home/ubuntu/mm/audfprint-master/tests/data/Recording_number_420.afpt 3655.4 sec 377315 raw hashes

Result by 2nd method : Matched 46.5 s starting at 1846.5 s in ./tests/data/Recording_number_420.mp3 to time 2.1 s in ./tests/data/ad.mp3 with 1020 of 35792 common hashes at rank 0 count 3

So the algorithm identifies that there is an ad in the 2nd case but it's not able to do it in the 1st one.

I hope I make myself clear this time, also, I tried explicitly defining --shifts as 4 but it doesn't seem to work.

Thanks a ton! Aditya

dpwe commented 5 years ago

What commands did you use to create all_recordings.pklz, and the files contained in precomp.list? That's where options like --shifts will come into play.

DAn.

On Tue, Dec 4, 2018 at 7:18 AM 2407adi notifications@github.com wrote:

Hey Dan, sorry for being vague. I tried to play around with the shifts parameter as you suggested but unfortunately it didn't solve the issue, so here goes my second try.

Here, I'll refer to the use case wherein we are looking to search ads in different recordings, as mentioned in "searching_for_ads.md"

So, in the first case The reference database was made out of the ad and the landmark features of all the recordings are saved as .afpt files, I then go on to match the precomputed landmarks against the reference database. Here's the code for the same: python3 audfprint.py match --dbase ad.db --min-count 100 --max-matches 100 --sortbytime --opfile precomp_test.csv --ncores 4 --find-time-range --exact-count --maxtimebits 18 --list precomp.list

whereas, in the second case, I pickle the recordings as the reference database and match the ad.mp3 with the reference database to get the results. Code: python3 audfprint.py match --dbase all_recordings.pklz --density 200 --hashbits 20 --find-time-range --exact-count --max-matches 100 --min-count 100 --ncores 4 --maxtimebits 18 --opfile result.csv /home/ubuntu/mm/audfprint-master/tests/data/ad.mp3

Now in the above example the 2nd method works great but the 1st one fails to identify the ad in many of the recordings, one such instance is stated below,

Result by 1st method : NOMATCH precomp/home/ubuntu/mm/audfprint-master/tests/data/Recording.afpt 3655.4 sec 377315 raw hashes

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dpwe/audfprint/issues/50#issuecomment-444080519, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhs0Zm-L-nIFDGiInYDx5RA7QNv2V6iks5u1mf6gaJpZM4Y-Yjz .

2407adi commented 5 years ago

Okay, so the command i used to make all_recordings.pklz was,

python3 audfprint.py new --dbase all_recordings.pklz --density 100 --maxtimebits 18 '/home/ubuntu/mm/audfprint-master/tests/data/Recordings/*'

the commands I used to make the '.afpt'/fingerprinting database were:

  1. ls -1 ./tests/data/test/Recordings/*.mp3 > records1.list
    1. python3 audfprint.py precompute --maxtimebits 18 --density 100 --hashbits 20 --precompdir precomp --list records1.list --ncores 4

the commands that I used to make the reference database in the case of precompute was:

  1. python3 audfprint.py new --dbase ad.db --density 200 --maxtimebits 18 ./tests/data/ad.mp3

I hope these inputs help, thanks a lot! :)

EDIT: Oh and another small thing, in the results, when the output says:

Matched... some_string ...with 1170 of 36594 common hashes at rank 0 count 1 Does this means that for the duration of the match, '36594' were the total landmarks out of which 1170 matched?

dpwe commented 5 years ago

Did you try adding --shifts 4 to python3 audfprint.py precompute ... line (when creating the precomputed files used in the precomp.list match command)?

DAn.

On Wed, Dec 5, 2018 at 7:19 AM 2407adi notifications@github.com wrote:

Okay, so the command i used to make all_recordings.pklz was,

python3 audfprint.py new --dbase all_recordings.pklz --density 100 --maxtimebits 18 '/home/ubuntu/mm/audfprint-master/tests/data/Recordings/*'

the commands I used to make the '.afpt'/fingerprinting database were:

  1. ls -1 ./tests/data/test/Recordings/*.mp3 > records1.list
  2. python3 audfprint.py precompute --maxtimebits 18 --density 100 --hashbits 20 --precompdir precomp --list records1.list --ncores 4

the commands that I used to make the reference database in the case of precompute was:

  1. python3 audfprint.py new --dbase ad.db --density 200 --maxtimebits 18 ./tests/data/ad.mp3

I hope these inputs help, thanks a lot! :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dpwe/audfprint/issues/50#issuecomment-444466308, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhs0blK7gSmpR43pejBvcStFRcNNcdGks5u17nQgaJpZM4Y-Yjz .

2407adi commented 5 years ago

Did you try adding --shifts 4 to python3 audfprint.py precompute ... line (when creating the precomputed files used in the precomp.list match command)? DAn. On Wed, Dec 5, 2018 at 7:19 AM 2407adi @.**> wrote: Okay, so the command i used to make all_recordings.pklz was, python3 audfprint.py new --dbase all_recordings.pklz --density 100 --maxtimebits 18 '/home/ubuntu/mm/audfprint-master/tests/data/Recordings/' the commands I used to make the '.afpt'/fingerprinting database were: 1. ls -1 ./tests/data/test/Recordings/.mp3 > records1.list 2. python3 audfprint.py precompute --maxtimebits 18 --density 100 --hashbits 20 --precompdir precomp --list records1.list --ncores 4 the commands that I used to make the reference database* in the case of precompute was: 1. python3 audfprint.py new --dbase ad.db --density 200 --maxtimebits 18 ./tests/data/ad.mp3 I hope these inputs help, thanks a lot! :) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#50 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhs0blK7gSmpR43pejBvcStFRcNNcdGks5u17nQgaJpZM4Y-Yjz .

Extremely sorry for the late reply Dan.

Yes I have tried the shifts = 4 argument in the pre-compute line, the results are the same. Also, I'd be very grateful if you can shed some light on the common hashes issue i.e.

Matched... some_string ...with 1170 of 36594 common hashes at rank 0 count 1

Does this mean that for the duration of the match, '36594' were the total landmarks out of which 1170 matched, or is 36594 the total landmarks in the entire recording out of which 1170 common matches are found?

dpwe commented 5 years ago

OK. To answer your last question, the report:

Matched query.mp3 5.6 sec 1896 raw hashes as Nine_Lives/05-Full_Circle.mp3 at   -0.0 s 
with   141 of   159 common hashes at rank  0

means that the query file query.mp3 was analyzed to 1,896 hashes. 159 of these also appeared in the Full_Circle reference item, but only 141 of those were consistent with the -0.0 s alignment time between query and reference. For efficiency, the first stage of matching simply selects reference items that contain many of the same hashes as the query. Then the second stage looks at the timings of those hashes, and sees if their relative timing is consistent between reference and query; it's the number of these timing-consistent common hashes that determines the overall match quality (inconsistent timings indicate coincidences).

Now, concerning --shifts: a larger number of shifts means that the audio is analyzed at greater density (it is reanalyzed at several small time shifts, to mitigate dependence on the precise framing of the STFT). By default, reference items are analyzed at 1 shift only, for speed and to reduce the size of the reference database. Query items, however, are analyzed at 4 shifts to increase the likelihood of finding matches. However, when precomputing hashes, the program doesn't know if the precomputed file will be used for reference or query; it defaults to 1 shift (the same as if were being directly imported as a reference item). Thus, when building a reference database, the results are the same with or without precompute:

$ python audfprint.py new --dbase fpdbase.pklz Nine_Lives/0*.mp3
...
Saved fprints for 9 files ( 2184 hashes) to fpdbase.pklz (0.00% dropped)
$ python audfprint.py precompute --precompdir precompdir Nine_Lives/0*.mp3
...
$ python audfprint.py new --dbase fpdbase.pklz precompdir/Nine_Lives/0*.afpt
...
Saved fprints for 9 files ( 2184 hashes) to fpdbase.pklz (0.00% dropped)

When matching, using the sound file directly uses 4 shifts, with the result that the match is found:

$ python audfprint.py  match --dbase fpdbase.pklz query.mp3
...
Matched query.mp3 5.6 sec 271 raw hashes as precompdir/Nine_Lives/05-Full_Circle.afpt at    0.0 s 
with    11 of    11 common hashes at rank  0

.. whereas using precomputed hashes for the query implicitly analyzes at 1 shift only, and fails to find the match:

$ python audfprint.py precompute --precompdir precompdir query.mp3 
...
wrote precompdir/query.afpt ( 86 hashes, 5.832 sec)
...
$ python audfprint.py  match --dbase fpdbase.pklz precompdir/query.afpt 
...
NOMATCH precompdir/query.afpt 5.6 sec 86 raw hashes

Explicitly requesting 4 shifts during precompute restores the default behavior:

$ python audfprint.py precompute --precompdir precompdir --shifts 4 query.mp3 
...
wrote precompdir/query.afpt ( 271 hashes, 5.832 sec)
...
$ python audfprint.py  match --dbase fpdbase.pklz precompdir/query.afpt 
...
Matched precompdir/query.afpt 5.6 sec 271 raw hashes as precompdir/Nine_Lives/05-Full_Circle.afpt at    0.0 s 
with    11 of    11 common hashes at rank  0

I'm not sure if this is the cause of your issue, but I hope this makes things a little more clear.

DAn.

2407adi commented 5 years ago

This was some great explanation Dan, this surely makes things a bit less foggy. Thanks!