commonsmachinery / blockhash

blockhash.io
MIT License
86 stars 28 forks source link

Use EXIF orientation tag when processing JPEG images #4

Open denisnazarov opened 9 years ago

denisnazarov commented 9 years ago

Hello, wanted to document a drastic different hamming distance between two identical images of different compression.

compressed.jpg fe7ffefffefffc3cf839f823f81cf818f00ff00ff007e007e003e003c0000000
original.jpg ecfce4fcf2f0fbc0fb00fc00e00060000000e000f000ff00ffe0fffcfffefffe

Using the hamming distance function in blockhash-js, it returns a value of 120.

The images: https://s3.amazonaws.com/denisnazarov/blockhash/compressed.jpg https://s3.amazonaws.com/denisnazarov/blockhash/original.jpg

Has this implementation been tested with high resolution images? I can add some failing tests.

denisnazarov commented 9 years ago

The original file was an uncompressed iphone jpeg.

After further investigation, I realized that blockhash doesn't consider the orientation in the metadata (after stripping metadata to upload to s3, it was sideways), therefore it was hashing it sideways, giving a completely different value. This wasn't obvious as most OS X seems to render iphone photos in the correct orientation in viewers.

Do you suggest of any way for accounting for orientation?

Feel free to close this issue. :+1:

jonasob commented 9 years ago

Very interesting! As a general note, you should also pay attention to #2 and the improved version in the m4-improv branch which we'll likely merge into master soon, but which will be a breaking change. It doesn't solve this particular issue though, but it seems to me that we should take orientation into consideration. Most likely the best way would simply be to ignore orientation metadata if we can. @artfwo what do you think?

artfwo commented 9 years ago

The orientation tag should be taken into account, as the reverse case (matching data/different orientation) is less likely to happen.

petli commented 9 years ago

I included this in the RFC draft update: https://github.com/commonsmachinery/blockhash-rfc/blob/m4/draft-commonsmachinery-urn-blockhash-00.txt#L195-L196