elisemercury / Duplicate-Image-Finder

difPy - Python package for finding duplicate or similar images within folders
https://difpy.readthedocs.io
MIT License
420 stars 65 forks source link

BUG: Filepaths with square brackets, [ ], are ignored/errored #94

Open MarcG2 opened 4 months ago

MarcG2 commented 4 months ago

I discovered another odd bug. If I have files located in a folder that contains square brackets in the name, difPy ends up ignoring those files. If the file name contains brackets, there doesn't seem to be a problem.

I'm using the CLI version on Windows in case that makes a difference.

EDIT: This is in regards v4.0.1. I haven't tried 4.1 which I noticed just came out.

elisemercury commented 4 months ago

Hi @MarcG2,

Thanks for reaching out! Please try the new version difPy v4.1.0 as it comes with some improvements around the algorithm. Feel free to reach out if the issue still persists.

Best, Elise

MarcG2 commented 4 months ago

I tested out the new version. The bug is still there. An example file path that gives that's a problem is

D:\Pictures[2024] Images\img1.png

If there's only one bracket as shown here, difpy works like normal

D:\Pictures[2024 Images\img1.png

So it appears to be a parsing issue. To replicate, simply create a folder with a bracket pair in its name and add 2 copies of the same image.

MarcG2 commented 1 month ago

I believe I know what's causing this problem. It's because difPy uses the glob library. Which employs unix style file path matching.

I recently encountered another app that failed on paths with square brackets for this exact reason. I haven't tested it yet, but square brackets need to be escaped.

If you don't want to implement a fix, I highly encourage you to at least update the documentation regarding this. Almost no Windows user are going to familiar with unix style pattern matching.