4chan / 4chan-API

Documentation for 4chan's read-only JSON API.
http://www.4chan.org/
1.03k stars 74 forks source link

MD5 hash not matching #99

Open theFra985 opened 1 year ago

theFra985 commented 1 year ago

I've encountered weird behavior around MD5 hashes of post images. On some images, the hash given by the API doesn't match the hash computed from the file downloaded from the CDN. (downloaded images open up correctly and match the preview)

e.g. OP Post's image from thread 253748455 should have a packed MD5 of ujlTLUO2h564rwtTExF\/VA== obtained from the post entry returned by requesting https://a.4cdn.org/a/thread/253748455.json but instead, when downloading the actual image, the packed MD5 is AmVzMTHfH+fCv3RpwzccSg==.

I tried a few different ways of downloading the files and computing the MD5 because I thought my method of calculating it was the issue but all of them didn't match.

A quick example would be executing curl -v https://i.4cdn.org/a/1686810663343226.jpg | openssl dgst -md5 -binary | base64 and openssl dgst -md5 -binary FILENAME | base64. The two computed hashes match each other but they don't match the one given by the API.

Am I missing something? Is there more information included in the hashed data?

bakugo commented 1 year ago

Cloudflare Polish is to blame, it was enabled on 4chan a few weeks ago. You can get the original image by bypassing the cloudflare cache, which can be done by appending a random query string to the url.

theFra985 commented 1 year ago

Thank you! After checking that is in fact the cause of the mismatch.

I think it should be documented in the md5 property section to prevent confusion.

Tyranical commented 5 months ago

Does not work for b.