Closed AGSPhoenix closed 5 years ago
users are starting to post colliding images
Did the files have the same size and pixel dimensions too?
In every case I've seen, yes. I'm not sure if that's an inherent property of the collision-generating techniques, but in both the post I linked and this collision, they were the same. (Edit: Forgot to mention, these two images were archived correctly because they are on different boards, and asagi only dedupes within a board.)
If you'd like to examine the original files for the first post, they're available here: https://www.rogdham.net/2017/03/12/gif-md5-hashquine.en
This is not something that occurs naturally often enough to be an issue. Double checking the size and dimensions should be enough to detect accidental collisions.
For now, having sha256 would be a 'nice to have'. As long as the MD5 collision attacks require specially crafting both images, it would take a lot of autism to exploit just to keep one image from being archived.
sha256 hashes would also provide us with some other media storage benefits I haven't gone into, but if it's not something you want to add lightly, go ahead and close the issue; it won't be a major blow for us.
Alrighty, I'll be closing this. It's unfortunately not something we can realistically add just because of archives. At least not for now.
MD5 collision attacks are easy enough nowadays that users are starting to post colliding images, and the archives are just reusing the image posted first. There's no good way to prevent this on the archiver side; downloading every image and computing the hash ourselves would be an enormous waste of resources, perceptual hashing of the thumbnails would be complicated and error prone.
If the API provided a SHA256 hash, checking that would basically eliminate this problem.