drewnoakes / metadata-extractor

Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
Apache License 2.0
2.55k stars 479 forks source link

ExifDirectoryBase.TAG_USER_COMMENT doesn't seem to decode properly #525

Closed cowwoc closed 3 years ago

cowwoc commented 3 years ago

Version 2.15.0

My understanding per the UserComment section of https://exiftool.org/faq.html#Q10 is that values are typically encoded as utf8 values stored in ASCII format. When I run exiftool -v3 on one of my JPG files I get:

  | | 15) UserComment = ASCIITime changed from 2005:02:12 20:31:37-0500 to 2005:02:12 1[snip]
  | |     - Tag 0x9286 (108 bytes, undef[108]):
  | |         1330: 41 53 43 49 49 00 00 00 54 69 6d 65 20 63 68 61 [ASCII...Time cha]
  | |         1340: 6e 67 65 64 20 66 72 6f 6d 20 32 30 30 35 3a 30 [nged from 2005:0]
  | |         1350: 32 3a 31 32 20 32 30 3a 33 31 3a 33 37 2d 30 35 [2:12 20:31:37-05]
  | |         1360: 30 30 20 74 6f 20 32 30 30 35 3a 30 32 3a 31 32 [00 to 2005:02:12]
  | |         1370: 20 31 32 3a 33 31 3a 33 37 2d 30 35 30 30 2e 0a [ 12:31:37-0500..]

The actual comment starts with "Time changed from". The word "ASCII" in front seems to indicate what encoding exiftool is using.

When I try reading this same exif tag using Beyond Compare I get:

41 53 43 49 49 00 00 00 54 69 6D 65 20 63 68 61 6E 67 65 64 20 66 72 6F 6D 20 32 30 30 35 3A 30 32 3A 31 32 20 32 30 3A 33 31 3A 33 37 2D 30 35 30 30 20 74 6F 20 32 30 30 35 3A 30 32 3A 31 32 20 31 32 3A 33 31 3A 33 37 2D 30 35 30 30 2E 0A 52 65 61 73 6F 6E 3A 20 43 61 6D 65 72 61 20 74 69 6D 65 20 6E 6F 74 20 73 65 74 2E

which matches exiftool's output above.

However, when I decode this same tag using metadata-extractor I get:

65 83 67 73 73 0 0 0 84 105 109 101 32 99 104 97 110 103 101 100 32 102 114 111 109 32 50 48 48 53 58 48 50 58 49 50 32 50 48 58 51 49 58 51 55 45 48 53 48 48 32 116 111 32 50 48 48 53 58 48 50 58 49 50 32 49 50 58 51 49 58 51 55 45 48 53 48 48 46 10 82 101 97 115 111 110 58 32 67 97 109 101 114 97 32 116 105 109 101 32 110 111 116 32 115 101 116 46

which seems to be incorrect. Can you please confirm you see this problem?

To reproduce, take any JPG file on your end, run exiftool "-UserComment=Hello World" my.jpg and try to decode it back.

drewnoakes commented 3 years ago

Can you share a sample image and your source, to make sure we're on the same page. If you open the file in a hex editor, what bytes do you see? The library will not be converting encodings at all.

cowwoc commented 3 years ago

Okay, here is a concrete example: test

exiftool -v3 output:

  | | 16) UserComment = ASCIIHello world
  | |     - Tag 0x9286 (19 bytes, undef[19]):
  | |         0318: 41 53 43 49 49 00 00 00 48 65 6c 6c 6f 20 77 6f [ASCII...Hello wo]
  | |         0328: 72 6c 64                                        [rld]

Output from this library: 65 83 67 73 73 0 0 0 72 101 108 108 111 32 119 111 114 108 100

Using a hex editor, I see the bytes mentioned by exiftool but not the ones mentioned by this library.

drewnoakes commented 3 years ago

When I try reading this same exif tag using Beyond Compare I get:

41 53 43 49 49 00 00 00 54 69 6D 65 20 63 68 61 6E 67 65 64 20 66 72 6F 6D 20 32 30 30 35 3A 30 32 3A 31 32 20 32 30 3A 33 31 3A 33 37 2D 30 35 30 30 20 74 6F 20 32 30 30 35 3A 30 32 3A 31 32 20 31 32 3A 33 31 3A 33 37 2D 30 35 30 30 2E 0A 52 65 61 73 6F 6E 3A 20 43 61 6D 65 72 61 20 74 69 6D 65 20 6E 6F 74 20 73 65 74 2E

which matches exiftool's output above.

However, when I decode this same tag using metadata-extractor I get:

65 83 67 73 73 0 0 0 84 105 109 101 32 99 104 97 110 103 101 100 32 102 114 111 109 32 50 48 48 53 58 48 50 58 49 50 32 50 48 58 51 49 58 51 55 45 48 53 48 48 32 116 111 32 50 48 48 53 58 48 50 58 49 50 32 49 50 58 51 49 58 51 55 45 48 53 48 48 46 10 82 101 97 115 111 110 58 32 67 97 109 101 114 97 32 116 105 109 101 32 110 111 116 32 115 101 116 46

The first values are in hexadecimal. The second values are identical to them, just printed in decimal.

cowwoc commented 3 years ago

Wow, that is just silly.

So instead of looking at exif.getString(ExifSubIFDDirectory.TAG_USER_COMMENT) I should have been looking at new String(exif.getByteArray(ExifSubIFDDirectory.TAG_USER_COMMENT)) which returns ASCIIHello world as expected.

Thank you for catching this.

drewnoakes commented 3 years ago

No worries! Yep, that code looks good.