elisemercury / Duplicate-Image-Finder

difPy - Python package for finding duplicate or similar images within folders
https://difpy.readthedocs.io
MIT License
449 stars 66 forks source link

ValueError. #2

Closed rqtqp closed 2 years ago

rqtqp commented 2 years ago

Hi there,

I'm trying to run this code on folder with more than 80k images:

Traceback (most recent call last):
  File ".\difpy.py", line 3, in <module>
    dif.compare_images("PATH TO FOLDER")
  File "C:\Users\user\.conda\envs\gan\lib\site-packages\difPy\dif.py", line 35, in compare_images
    imgs_matrix = dif.create_imgs_matrix(directory, px_size)
  File "C:\Users\user\.conda\envs\gan\lib\site-packages\difPy\dif.py", line 121, in create_imgs_matrix
    imgs_matrix = np.concatenate((imgs_matrix, img))
  File "<__array_function__ internals>", line 6, in concatenate
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 3 dimension(s) and the array at index 1 has 2 dimension(s)

what am i doing wrong?

Thanks in advance

elisemercury commented 2 years ago

Hello rqtqp!

Thanks a lot for your comment and for reaching out! I am unsure what the issue could be here, as it could be caused by a lot of different reasons. For me to clarify, what are the image types of the 80k images? Are those all the same file type, or does the folder contain different types? When are you experiencing this issue, directly when calling the function i. e. at the start, or does the code run for a while and then throws out the error?

I will work on fixing the issue ASAP and will give you and update in the next few days

Again, thanks and all the best, Elise

ppizarror commented 2 years ago

Hi, I was able to reproduce this error if the folder has black and white images. Basically, the image shape is 2, so after converting to RGB the process continued flawlessly.

Without the patch:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.9/site-packages/difPy/dif.py", line 36, in compare_images
    imgs_matrix = dif.create_imgs_matrix(directory, px_size)
  File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.9/site-packages/difPy/dif.py", line 124, in create_imgs_matrix
    imgs_matrix = np.concatenate((imgs_matrix, img))
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 3 dimension(s) and the array at index 1 has 2 dimension(s)

After the patch:

***
Found 0 duplicate image pairs in 33458 total images.

The following files have lower resolution:
[] 
aknpp commented 2 years ago

I had the same error. It was due to images that were black and white.

I fixed it by changing cv2.IMREAD_UNCHANGED to cv2.IMREAD_COLOR found in this line: https://github.com/elisemercury/Duplicate-Image-Finder/blob/02c1fe2fa9915de473503b80b5f671a850447c5d/difPy/dif.py#L112

There are different modes to use found here. I don't know how the performance will be affected by this change, but it solves the issue.

rqtqp commented 2 years ago

Confirm on my own. once all images converted to RBG the process goes smoothly.