4chan / 4chan-API

Documentation for 4chan's read-only JSON API.
http://www.4chan.org/
1.02k stars 73 forks source link

Request: image SHA256 hash #70

Closed AGSPhoenix closed 5 years ago

AGSPhoenix commented 5 years ago

MD5 collision attacks are easy enough nowadays that users are starting to post colliding images, and the archives are just reusing the image posted first. There's no good way to prevent this on the archiver side; downloading every image and computing the hash ourselves would be an enormous waste of resources, perceptual hashing of the thumbnails would be complicated and error prone.

If the API provided a SHA256 hash, checking that would basically eliminate this problem.

desuwa commented 5 years ago

users are starting to post colliding images

Did the files have the same size and pixel dimensions too?

AGSPhoenix commented 5 years ago

In every case I've seen, yes. I'm not sure if that's an inherent property of the collision-generating techniques, but in both the post I linked and this collision, they were the same. (Edit: Forgot to mention, these two images were archived correctly because they are on different boards, and asagi only dedupes within a board.)

If you'd like to examine the original files for the first post, they're available here: https://www.rogdham.net/2017/03/12/gif-md5-hashquine.en

desuwa commented 5 years ago

This is not something that occurs naturally often enough to be an issue. Double checking the size and dimensions should be enough to detect accidental collisions.

AGSPhoenix commented 5 years ago

For now, having sha256 would be a 'nice to have'. As long as the MD5 collision attacks require specially crafting both images, it would take a lot of autism to exploit just to keep one image from being archived.

sha256 hashes would also provide us with some other media storage benefits I haven't gone into, but if it's not something you want to add lightly, go ahead and close the issue; it won't be a major blow for us.

desuwa commented 5 years ago

Alrighty, I'll be closing this. It's unfortunately not something we can realistically add just because of archives. At least not for now.