Closed rdecaneva closed 6 years ago
Good question. You do this by using a tuple or namedtuple of (hash, id) for your items. See the example I just added to the README under "If you need to track the ID, key, or filename of the original item, use a tuple or namedtuple. Repeating the above example with an Item namedtuple:"
Hello: I am fairly new to python, but I found your article very informative and easy to follow. I believe I have the script working for my needs, however as you mention at the bottom of the article the larger the database the slower the comparisons.
I have a script with watchdog that waits for changes on a directory of images. When an image is uploaded, the file is processed, a dhash is generated, and then passed to a SQL database.
I've been experimenting with BKTrees. If I understand them correctly the entire tree is stored in memory at script runtime. My question is how do I identify which image is the actual duplicate from the tree? How can I store a primary key or some unique value in the tree so I can later identify which images are similar to each other?
Thank you!