bwindels / exif-parser

A javascript library to extract EXIF metadata from JPEG images, in node and in the browser.
MIT License
218 stars 53 forks source link

wrong encoding on exif strings (bug) #11

Open acidicX opened 8 years ago

acidicX commented 8 years ago

Hey there,

first things first: thanks for the module!

I'm having encoding problems with certain EXIF strings, e.g. the ImageDescription

This is what other EXIF readers tell me: Fahrt über den Ozean
but I am getting this with exif-parser: Fahrt C<ber den Ozean

Seems to me like they are not read with the proper encoding.

I'd fix it myself, but I have no idea how the tags are encoded in the first place :/

Cheers, acidicX

langpavel commented 8 years ago

This should be hard.. I expect that this library expects UTF8 as encoding.. but your example looks like some 8bit encoding is used instead? @bwindels Can you describe gently how this works? Thanks!

acidicX commented 8 years ago

I'm not sure if UTF-8 is in the EXIF specs. The image descriptions were edited by Adobe Lightroom, but Adobe sometimes hates standards (check the SVG export of Illustrator :-1: ).

langpavel commented 8 years ago

Hi I figured this out. Using ArrayBuffer, DataView and TextDecoder API with polyfill you can read UTF-8 strings. And yes, UTF-8 is best choice, backward compatible with ASCII and confirmed for Czech..

acidicX commented 7 years ago

@langpavel I just found that the bug still exists. Did you fork the lib to resolve it? seems that PRs are not actively worked on anymore... or did you find a better lib?

langpavel commented 7 years ago

Hi, I have no time to work on this, sorry..

bwindels commented 7 years ago

EXIF assumes ASCII and doesn't have a field to specify an encoding, so without using an encoding detection library, this will be hard to do. Since this library needs to work in the browser as well as in node.js, I'd be hesitant to add a big thing like encoding detection to it.

bwindels commented 7 years ago

More info here as well: https://stackoverflow.com/questions/19284205/safe-to-use-utf8-decoding-for-exif-property-marked-as-ascii

bwindels commented 7 years ago

I did notice that on node.js, the library forcefully decodes using ASCII, while in the browser it uses UTF16 (Compatible with ASCII). Ideally, on both platforms it should decode using UTF-8, since that's what's most widely used and compatible with ASCII as well. Your example text might be encoded with UTF-8 as a matter of fact. Browser support for UTF-8 is not ubiquitous, so might be hard to do cross-platform, I'll have a look.

bwindels commented 7 years ago

Released 0.1.11 that uses utf-8 for nodejs. If you want, you could test if the description in your image decodes properly now on nodejs. For the browser, we'd have to use TextEncoder if supported, and revert to fromCodePoint and fromCharCode if not. Don't have time to do this right now, but you're welcome to make a PR.

SergioCrisostomo commented 6 years ago

@acidicX if this issue is still a problem for you could you share the image that causes you problems so others can look at it and try to fix/suggest changes?

bwindels commented 6 years ago

TextDecoder seems reasonably supported nowadays. Worth a look at some point.