Closed sambux1 closed 2 years ago
If we can find a video- or image-hashing algorithm which can hash the media into a string, this is very easy to implement. The interface would just need an object to represent what is being uploaded and the file path of the media as the input to said algorithm, then we can use that string output as the _s_data
value we're currently using for blocks.
It doesn't matter if the algorithm is slow or old, it just needs to work so we can get the system off the ground.
Starting with image hashing - video hashing is much more complex, we'll build up to that. Looking into perceptual hashes which can hash an image into the same value regardless of minor modifications (e.g. similar resolutions, similar colors, text, etc.). I found an implementation of a well-known open-source algorithm pHash, so I'll try implementing that this week.
I'll also write up a guide on digital signal/image processing which would be useful for programming and image hashing. We'll need a "hashing guide" which covers general hashing algorithms, image hashing, and video hashing soon, too.
Here is a basic guide with some pseudocode to implement a perceptual hash. Not up to pHash's robustness though.
Here is a really robust perceptual hash developed by the person who created pHash (above) - we can worry about implementing this later since it'll take more time, but this should be strongly considered later.
I implemented a rough version of perceptual image hashing. For now, it just takes an image as input, compresses it to 8x8, converts it to grayscale, extracts a bit from each pixel, and outputs the bits as a 16 digit hex string.
This is far from a complete version of image hashing, but it is a good enough first step for v0.1.
I'm removing the v0.1 label and adding the v0.3 label. This is enough to work through v0.2.
Edit: I'm going to undo that relabel so we can see this as progress in the v0.1 category, and I'll open a new issue for a future version of image hashing.
The system currently only supports text data as a test. At a minimum, we need to be able to upload and hash images and videos.
I will edit this issue with more information and a specific course of action.