Max file in database - Githubissues

mariagibert commented 5 years ago

Hi!

I'm crating the database out of 10k files. My question is, is it possible? I've seen that while processing the files this message appears:

Read fprints for 2023 files ( 8013111 hashes) from fpdbase.pklz (1.09% dropped)

Dropped ones is getting higher and number of files "2023" doesn't increase.

Should I create smaller databases?

Thanks!

dpwe commented 5 years ago

The "hashes dropped" proportion will increase as the database fills up. However, it's dropping hashes from every track, not entire tracks. Recognition will be mildly impacted, but everything should continue to work. I wouldn't worry until the drop %age reaches 5-10%, and even then it should be OK.

As the database becomes fuller, matching becomes slower and memory usage increases. You could break up into multiple databases and match sequentially; however, that would be even slower (but would bound the memory use).

On Wed, Jan 9, 2019 at 4:01 AM mariagibert notifications@github.com wrote:

Hi!

I'm crating the database out of 10k files. My question is, is it possible? I've seen that while processing the files this message appears:

Read fprints for 2023 files ( 8013111 hashes) from fpdbase.pklz (1.09% dropped)

Dropped ones is getting higher and number of files "2023" doesn't increase.

Should I create smaller databases?

Thanks!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dpwe/audfprint/issues/54, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhs0QsW9hrMyb4FAXXZluicQeQpRX0zks5vBa_MgaJpZM4Z3ChF .

mariagibert commented 5 years ago

Great! Thanks a lot for your answer!

dpwe commented 5 years ago

"2023" should be the total number of files you've added to fpdbase.pklz. I can't think why it wouldn't change if the %age dropped is increasing. What is the sequence of commands you're using? "audfprint new" followed by several "audfprint add" commands? Are you using the same names for the files you're adding in each run? I haven't actually ever tried that, but you need to use unique names for each track in the database, otherwise you won't be able to tell them apart when you get the match reports.

mariagibert commented 5 years ago

I'm using "add" as every audio file is in different directory. Each file has different name.

Should it be fine?

dpwe commented 5 years ago

It should be fine. But you can use --list and full file paths in a text file (or even just multiple full file paths on the command line) to add many files at once, and it will be much quicker.

DAn.

On Wed, Jan 9, 2019 at 7:50 AM mariagibert notifications@github.com wrote:

I'm using "add" as every audio file is in different directorie. Each file has different name.

Should it be fine?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dpwe/audfprint/issues/54#issuecomment-452685381, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhs0fdI-_9mRRul1HFMJQzNoJXM9KhEks5vBeWMgaJpZM4Z3ChF .

mariagibert commented 5 years ago

Oh! Perfect! Thanks!

dpwe / audfprint

Max file in database #54