cozmo / jsQR

A pure javascript QR code reading library. This library takes in raw images and will locate, extract and parse any QR code found within.
https://cozmo.github.io/jsQR/
Apache License 2.0
3.7k stars 607 forks source link

Assumes UTF-8 #129

Open kousu opened 5 years ago

kousu commented 5 years ago

BYTE mode implicitly assumes data is in UTF-8 because that's what encodeURIComponent/decodeURIComponent assume:

https://github.com/cozmo/jsQR/blob/01d3b0a3889b6da02486ea5c26e5bfaaa268d61a/src/decoder/decodeData/index.ts#L129-L146

If the text is not in UTF-8 then appending to to .text fails (renamed ->.data here for output), but the content is available in .bytes (renamed -> .binaryData for output), so this isn't the end of the world.

It is, however, against the QR Spec. You implemented ECI in #71 by just recording them in the chunks but you're supposed to use them to set the character set in use for each Bytes block. The newer fork of zxing handles this by keeping CurrentCharset around and using it when it runs into a BYTE segment (and the old fork does basically the same thing with currentCharacterSetECI). That means having a charset-to-charset conversion library like iconv available, or hardcoding the to-unicode translations.

qrcode seems to make the same assumption, so you won't have interoperability problems if you keep all your QR coding in Javascript, and you also probably won't have problems unless you're mixing languages, but the bug is lurking.


The QR standard was written in an earlier age. As the authors of a popular implementation, you could probably get away with just declaring the UTF-8 is the only valid text encoding, which is a 67-times better solution than messing with the proprietary ECI character sets that are rapidly being forgotten. If you go that route, I would ask for a prominent note in the docs explaining that you're specifically only implementing a (sane) subset of the QR spec, and that you coordinate with qrcode to give up their plan to implement ECI and apply the same UTF-8-only restriction.