divan / txqr

Transfer data via animated QR codes
MIT License
2.97k stars 172 forks source link

Misguided base64 encoding #9

Open antong opened 5 years ago

antong commented 5 years ago

The encoders and decoders do base64 encoding of the binary data. The blog post says that this is so that only alphanumeric data is passed to the QR encoder. I assume this is because QR has an alphanumeric mode to more efficiently encode "alphanumeric" data. However, base64 encoding doesn't help for this because the QR "alphanumeric" encoding doesn't encode e.g., lower case letters, only 0-9, A-Z, $%*+-./: and space . I suggest to skip the base64 step to reduce the data size and improve performance.

DonaldTsang commented 5 years ago

But would it work on inputs with arbitrary bytes (e.g. UTF-8) if the changes are made?

antong commented 5 years ago

Yes, sure. QR codes can contain arbitrary binary data.

divan commented 5 years ago

Great input. You're right, I'm not sure if current approach is more efficient that just using QR native binary mode. I was planning to run tests and do the math for that, but this project now is a little bit at the bottom of my backlog. But good to keep this in mind and return to the issue when I have some time.

DonaldTsang commented 5 years ago

@antong So there is text mode, which we can use base64 or some other arbitrary base on, or we can use binary mode with UTF-8, which would be good to test on.

@divan take it low and slow, my friend. It is better to have a better product later than a bad one now.

antong commented 5 years ago

The QR encoder selects the data encoding mode automatically, and will do quite a good job selecting binary, text or numeric depending on the bytes being encoded. If you first Base-64 encode the data you will immediately have 33% overhead, and on top of that the QR coder can probably no longer select any other QR encoding than binary. So if the original data had sequences of numeric or alphanumeric bytes, then the data could have been even more efficiently encoded without base-64 encoding.

Example encoding different 12 byte (96bit) messages (https://play.golang.org/p/BC2892EZC9B):

DonaldTsang commented 5 years ago

@antong but I think QR code can encode in bytes as well, which is better than base64 then alphanumeric. Also let us assume that the input is UTF-8 (or arbitrary data) and not only numbers or ASCII.

antong commented 5 years ago

QR can encode arbitrary bytes yes, that is exactly my point. Base64 encoded bytes can not normally be encoded using the QR alphanumeric encoding, because QR alphanumeric is uppercase only. So base64 encoding just adds 33% overhead and then practically forces the QR encoding to use "byte" encoding for the data. My point is that it is always better to not base64 encode and let the QR encoding either directly use "byte" encoding or optimize by using more efficient alphanumeric or numeric encoding if possible.

DonaldTsang commented 5 years ago

QR byte encoding it is

xulihang commented 3 years ago

Base64 will increase the size by 33%. I've implemented a simple animated QR codes reading web app by directly using the bytes: https://github.com/xulihang/AnimatedQRCodeReader