Does chunk the buffer affect multibyte characters?

dankogai / js-base64

Base64 implementation for JavaScript

BSD 3-Clause "New" or "Revised" License

4.27k stars 1.33k forks source link

Does chunk the buffer affect multibyte characters? #170

Closed JC-Ge closed 1 year ago

JC-Ge commented 1 year ago

https://github.com/dankogai/js-base64/blob/34cd9344dae428adbde8084e28339a591bbdf7e5/base64.ts#L71

Does chunk the buffer affect multibyte characters? If there is no effect, I would like to know why

dankogai commented 1 year ago

This is to prevent _fromCC (which is really String.fromCharCode.bind(String)) from getting too many arguments. The number of arguments a function can receive is not that big. 0x1000 = 4096 is a conservative value which works on all JS implementations that this module supports. See also:

JC-Ge commented 1 year ago

I see in

https://stackoverflow.com/questions/12710001/how-to-convert-uint8-array-to-base64-encoded-string/12713326#12713326 this is only correct if your buffer only contains non-multibyte ASCII characters

If my string contain Chinese，Doesn't this code have problem？

dankogai commented 1 year ago

Look the source carefully. It is within _fromUint8Array() which deals with Uint8Array not String. Always single-byte.

JC-Ge commented 1 year ago

Always single-byte?

_fromUint8Array(_TE.encode(s)) I see you use TextEncoder in _TE.

In this case: new TextEncoder().encode('我');// [230,136,145], is 3 byte

dankogai commented 1 year ago

_fromUint8Array(_TE.encode(s)) is to implement Base64.encode() which encodes UTF-8 string to Base64. Once it is Uint8Array you do not have to worry about multibyte handling since it is done beforehand. That is the whole point of using TextEncoder. Let it handle multibyte and stop worrying about multibyte for the rest of the process.