AkshDesai04 / PyCompare

This Python project aims to efficiently compare large datasets of images to identify duplicates
MIT License
1 stars 3 forks source link

Image being read multiple times #16

Open AkshDesai04 opened 4 months ago

AkshDesai04 commented 4 months ago

The image file has been read multiple times so far

  1. during metadata reading
  2. during comparison for loop i
  3. during comparison for loop j

The image reading from the disk is what takes the longest. The rest happens quite instantly. One approach is storing the pics in RAM in a list when we read them from the disc initially for metadata reading but, this will be an issue when the dataset is hundreds of GBs or in TBs