bellingcat / auto-archiver

Automatically archive links to videos, images, and social media content from Google Sheets (and more).
https://pypi.org/project/auto-archiver/
MIT License
552 stars 55 forks source link

Add pdqhash as perceptual hash to images and thumbnails #81

Closed emieldatalytica closed 1 year ago

emieldatalytica commented 1 year ago

PDQ is a photo hashing algorithm that can turn photos into 256 bit signatures which can then be used to match other photos. The goal of this image hashing is to represent an image using a fixed-size representation (the hash) that preserves visual similarity, i.e., images that are visually similar should have similar hashes.

The PDQ hashes enable fuzzy deduplication of images and video (through thumbnails).