erdogant / undouble

Python package undouble is to detect (near-)identical images.
BSD 3-Clause "New" or "Revised" License
47 stars 0 forks source link

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() #1

Closed lazy-programm-er closed 2 years ago

lazy-programm-er commented 2 years ago

I am facing this error on model.import_data(targetdir)

Each image size is around 5-8 mb.

does the image size causes this error?

model = Undouble(method='phash',hash_size=8) model.import_data(targetdir)

[undouble] >INFO> Extracting images from: [./images/set1/] [undouble] >INFO> [6032] files are collected recursively from path: [./images/set1/] [undouble] >INFO> [6032] images are extracted. [undouble] >INFO> Reading and checking images. [undouble] >INFO> Reading and checking images. 3%|▎ | 203/6032 [00:26<07:00, 13.85it/s]Corrupt JPEG data: 36 extraneous bytes before marker 0xd9 4%|▍ | 239/6032 [00:31<18:40, 5.17it/s]Corrupt JPEG data: 117 extraneous bytes before marker 0xd9 5%|▌ | 313/6032 [00:41<14:34, 6.54it/s]Corrupt JPEG data: 90 extraneous bytes before marker 0xd9 9%|▊ | 521/6032 [01:03<08:07, 11.30it/s]Corrupt JPEG data: 1895 extraneous bytes before marker 0xd9 11%|█ | 661/6032 [01:17<11:03, 8.10it/s]Corrupt JPEG data: 50 extraneous bytes before marker 0xd9 13%|█▎ | 795/6032 [01:32<08:00, 10.90it/s]Corrupt JPEG data: 47 extraneous bytes before marker 0xd9 18%|█▊ | 1069/6032 [02:03<08:30, 9.72it/s]Corrupt JPEG data: 1076 extraneous bytes before marker 0xd9 19%|█▉ | 1164/6032 [02:11<05:45, 14.08it/s]Invalid SOS parameters for sequential JPEG 21%|██ | 1241/6032 [02:20<12:14, 6.52it/s]Corrupt JPEG data: 51 extraneous bytes before marker 0xd9 26%|██▌ | 1554/6032 [02:57<15:41, 4.76it/s]Corrupt JPEG data: 102 extraneous bytes before marker 0xd9 28%|██▊ | 1669/6032 [03:11<08:54, 8.16it/s]Corrupt JPEG data: 116 extraneous bytes before marker 0xd9 29%|██▉ | 1747/6032 [03:21<06:43, 10.62it/s]Corrupt JPEG data: 1487 extraneous bytes before marker 0xd9 32%|███▏ | 1930/6032 [03:42<05:48, 11.78it/s]Corrupt JPEG data: 43 extraneous bytes before marker 0xd9 35%|███▌ | 2130/6032 [04:04<05:06, 12.73it/s]Corrupt JPEG data: 89 extraneous bytes before marker 0xd9 36%|███▋ | 2196/6032 [04:12<05:38, 11.34it/s][undouble] >WARNING> Scaling not possible. [undouble] >WARNING> Could not read: [./images/set1/123.jpg] 37%|███▋ | 2248/6032 [04:18<06:33, 9.62it/s][undouble] >WARNING> Scaling not possible. [undouble] >WARNING> Could not read: [./images/set1/215.jpg] 44%|████▍ | 2666/6032 [05:15<05:57, 9.41it/s]Invalid SOS parameters for sequential JPEG 46%|████▋ | 2791/6032 [05:32<05:42, 9.46it/s][undouble] >WARNING> Scaling not possible. [undouble] >WARNING> Could not read: [./images/set1/322.jpg] 59%|█████▉ | 3577/6032 [07:22<03:31, 11.61it/s]Invalid SOS parameters for sequential JPEG 61%|██████▏ | 3695/6032 [07:36<04:37, 8.43it/s]Corrupt JPEG data: 1640 extraneous bytes before marker 0xd9 65%|██████▍ | 3913/6032 [08:06<07:28, 4.73it/s]Corrupt JPEG data: 92 extraneous bytes before marker 0xd9 65%|██████▌ | 3945/6032 [08:10<04:11, 8.28it/s]Corrupt JPEG data: 1894 extraneous bytes before marker 0xd9 76%|███████▌ | 4578/6032 [09:31<03:15, 7.43it/s]Corrupt JPEG data: 38 extraneous bytes before marker 0xd9 79%|███████▉ | 4754/6032 [09:52<01:43, 12.40it/s]Invalid SOS parameters for sequential JPEG 86%|████████▌ | 5193/6032 [10:46<02:04, 6.76it/s]Corrupt JPEG data: 41 extraneous bytes before marker 0xd9 86%|████████▌ | 5196/6032 [10:47<01:33, 8.98it/s]Corrupt JPEG data: 41 extraneous bytes before marker 0xd9 86%|████████▋ | 5205/6032 [10:47<01:16, 10.86it/s]Corrupt JPEG data: 364 extraneous bytes before marker 0xd2 94%|█████████▍| 5690/6032 [11:43<00:41, 8.20it/s][undouble] >WARNING> Scaling not possible. [undouble] >WARNING> Could not read: [./images/set1/445.jpg] 97%|█████████▋| 5855/6032 [12:03<00:13, 13.48it/s]Invalid SOS parameters for sequential JPEG 98%|█████████▊| 5891/6032 [12:09<00:13, 10.52it/s]Corrupt JPEG data: 98 extraneous bytes before marker 0xd9 100%|██████████| 6032/6032 [12:27<00:00, 8.07it/s]

Traceback (most recent call last): File "app.py", line 64, in model.import_data(targetdir) File "/root/.local/lib/python3.6/site-packages/undouble/undouble.py", line 142, in import_data self.results = self.clustimage.import_data(self.params['targetdir'], black_list=black_list) File "/root/.local/lib/python3.6/site-packages/clustimage/clustimage.py", line 997, in import_data X = self.preprocessing(Xraw, grayscale=self.params['cv2_imread_colorscale'], dim=self.params['dim'], flatten=flatten) File "/root/.local/lib/python3.6/site-packages/clustimage/clustimage.py", line 833, in preprocessing if np.where(np.array(list(map(len, img)))<min_nr_pixels)[0]: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

lazy-programm-er commented 2 years ago

I guess its due to corrupt images in the dataset. can you add a feature to exclude the corrupt images from dataset?

erdogant commented 2 years ago

Thanks for posting! I am going to look into this. But first I need to find some corrupted images to play around with ;)

erdogant commented 2 years ago

I pushed an update with a fix for corrupted images. Can you check whether this solves your issue?

Version should be >= 1.2.1 pip install -U undouble

lazy-programm-er commented 2 years ago

Thanks..I will check this, I have also found another error. It throws out an error when the image resolution is more than 8k.

I have some images which has 10400 x 10280

happens in this block. https://github.com/erdogant/clustimage/blob/f71dc91df379bb52a06e78662d2b344b9acfc555/clustimage/clustimage.py#L2107

As of now I am declaring dim to (128,128) or any value to mitigate this.

erdogant commented 2 years ago

Thanks. I could re-reproduce the error and added a warning in case a dim larger than (1024, 1024) is used.