ahmedh409 / deepfake-detection

GNU General Public License v3.0
4 stars 1 forks source link

Perceptual Image Hashing v1 #2

Closed sambux1 closed 2 years ago

sambux1 commented 2 years ago

The system currently only supports text data as a test. At a minimum, we need to be able to upload and hash images and videos.

I will edit this issue with more information and a specific course of action.

ahmedh409 commented 2 years ago

If we can find a video- or image-hashing algorithm which can hash the media into a string, this is very easy to implement. The interface would just need an object to represent what is being uploaded and the file path of the media as the input to said algorithm, then we can use that string output as the _s_data value we're currently using for blocks.

It doesn't matter if the algorithm is slow or old, it just needs to work so we can get the system off the ground.

ahmedh409 commented 2 years ago

Starting with image hashing - video hashing is much more complex, we'll build up to that. Looking into perceptual hashes which can hash an image into the same value regardless of minor modifications (e.g. similar resolutions, similar colors, text, etc.). I found an implementation of a well-known open-source algorithm pHash, so I'll try implementing that this week.

I'll also write up a guide on digital signal/image processing which would be useful for programming and image hashing. We'll need a "hashing guide" which covers general hashing algorithms, image hashing, and video hashing soon, too.

ahmedh409 commented 2 years ago

Here is a basic guide with some pseudocode to implement a perceptual hash. Not up to pHash's robustness though.

Here is a really robust perceptual hash developed by the person who created pHash (above) - we can worry about implementing this later since it'll take more time, but this should be strongly considered later.

sambux1 commented 2 years ago

I implemented a rough version of perceptual image hashing. For now, it just takes an image as input, compresses it to 8x8, converts it to grayscale, extracts a bit from each pixel, and outputs the bits as a 16 digit hex string.

This is far from a complete version of image hashing, but it is a good enough first step for v0.1.

sambux1 commented 2 years ago

I'm removing the v0.1 label and adding the v0.3 label. This is enough to work through v0.2.

Edit: I'm going to undo that relabel so we can see this as progress in the v0.1 category, and I'll open a new issue for a future version of image hashing.