dankogai / js-base64

Base64 implementation for JavaScript
BSD 3-Clause "New" or "Revised" License
4.27k stars 1.33k forks source link

This version of base64 increases the data size by 100% #160

Closed 1MikeMakuch closed 1 year ago

1MikeMakuch commented 1 year ago

base64 is supposed to increase data size by 33% - 37%, https://en.wikipedia.org/wiki/Base64

This version doubles the data size! Apparently it is a naive version of base64.

dankogai commented 1 year ago

For very small data Base64 can even quadruple the data. e.g.

Base64.encode('A') === 'QQ=='

Which is NOT naive. It is just official.

1MikeMakuch commented 1 year ago

50k encodes to 100k, consistently. Why?

dankogai commented 1 year ago

Would you show me an example? Here is counterexample Which 6 bytes of original data encodes to 8-byte base64.

> var str = "Base64"
undefined
> str.length
6
> Base64.encode(str)
'QmFzZTY0'
> Base64.encode(str).length
8
1MikeMakuch commented 1 year ago

Happy to. I've been using it to encode small images, so much larger than your example. But it appears to be dependent upon the data too, unlike the base64 module which is consistently 33% larger. For example, I first tried creating a string of 50k of the letter 'a'. And jsbase64 was the same length as base64. But with random binary data, like in images jsbase64 is double.

Here is a simple example showing it;

$ cat scripts/base64.js 
#!/usr/bin/env node

//
const fs = require('fs')
const base64 = require('base-64')
const jsbase64 = require('js-base64')

let s = fs.readFileSync(process.argv[2], 'binary')

const s_jsbase64 = jsbase64.encode(s)
const s_base64 = base64.encode(s)

console.log('s         ', s.length)
console.log('s_jsbase64', s_jsbase64.length)
console.log('s_base64  ', s_base64.length)

$ dd if=/dev/urandom of=1MB bs=1MB count=1
1+0 records in
1+0 records out
1000000 bytes (1.0 MB, 977 KiB) copied, 0.002143 s, 467 MB/s
$ 
$ ls -l 1MB 
-rw-r--r-- 1 mkm staff 1000000 Dec  7 20:25 1MB

$ node scripts/base64.js 1MB 
s          1000000
s_jsbase64 2000016
s_base64   1333336

Hope that helps. Cheers Mike

dankogai commented 1 year ago

I think I got it. Base64.encode treats input as a string which is NOT binary. See

https://github.com/dankogai/js-base64#decode-vs-atob-and-encode-vs-btoa

Use Base64.btoa() or Base64. fromUint8Array()

1MikeMakuch commented 1 year ago

I don't understand what you're suggesting.. are you suggesting that base64.encode() does not work on binary data?

Both work. I encode and then decode and end up with the original binary file.

$ cat scripts/base64.js 
#!/usr/bin/env node

//
const fs = require('fs')
const base64 = require('base-64')
const jsbase64 = require('js-base64')

let s = fs.readFileSync(process.argv[2], 'binary')

const s_jsbase64 = jsbase64.encode(s)
const s_base64 = base64.encode(s)

console.log('s         ', s.length)
console.log('s_jsbase64', s_jsbase64.length)
console.log('s_base64  ', s_base64.length)

const s_base64_back = base64.decode(s_base64)
fs.writeFileSync('base64_' + process.argv[2], s_base64_back, 'binary')

const s_jsbase64_back = jsbase64.decode(s_jsbase64)
fs.writeFileSync('jsbase64_' + process.argv[2], s_jsbase64_back, 'binary')

$ node ~/scripts/base64.js helloWorld-10MB.png 
s          10000000
s_jsbase64 20004252
s_base64   13333336

$ ls -l *helloWorld-10MB.png
-rw-r--r-- 1 mkm staff 10000000 Dec  7 23:09 base64_helloWorld-10MB.png
-rw-r--r-- 1 mkm staff 10000000 Dec  7 23:09 helloWorld-10MB.png
-rw-r--r-- 1 mkm staff 10000000 Dec  7 23:09 jsbase64_helloWorld-10MB.png

$ cmp helloWorld-10MB.png base64_helloWorld-10MB.png 
$ cmp helloWorld-10MB.png jsbase64_helloWorld-10MB.png 
1MikeMakuch commented 1 year ago

I noticed that in your code you reference the wiki page on base64, which clearly states that base64 needs 33% more memory. But I've shown you that your jsbase64 uses approximately double memory in my example above. Can you speak to this issue?

dankogai commented 1 year ago

don't understand what you're suggesting.. are you suggesting that base64.encode() does not work on binary data?

No it does not. It is for UTF-8 string and string is not binary. Base64.encode() was born before Uint8Array. There were only atob() and btoa() which could not handle UTF-8 correctly.

And the reason Base64.encode() emits more bytes than Base64.atob() is that it encodes string to binary before it encodes it to Base64.

1MikeMakuch commented 1 year ago

Please define what you mean by "binary".

On Wed, Dec 7, 2022, 11:29 PM Dan Kogai @.***> wrote:

don't understand what you're suggesting.. are you suggesting that base64.encode() does not work on binary data?

No it does not. It is for UTF-8 string and string is not binary. Base64.encode() was born before Uint8Array. There were only atob() and btoa() which could not handle UTF-8 correctly.

And the reason Base64.encode() emits more bytes than Base64.atob() is that it encodes string to binary before it encodes it to Base64.

— Reply to this email directly, view it on GitHub https://github.com/dankogai/js-base64/issues/160#issuecomment-1342071767, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2XGZRCM5EIPSQULONZHHTWMFW5LANCNFSM6AAAAAASXOJU6Y . You are receiving this because you authored the thread.Message ID: @.***>

dankogai commented 1 year ago

Exactly as ECMAScript defines binary. If you do not understand what I am talking about, further study the history of JS. Buffer of node.js may also help.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer https://nodejs.org/api/buffer.html#buffers-and-typedarrays

1MikeMakuch commented 1 year ago

I've been programming long before Javascript was invented sir. The term "binary" data is more general than that definition. It simply means bytes with the 8th bit set on, in general. In any case, that is all I meant when I said "binary" data. But we digress. The issue I am posting about is not about the definition of binary data.

I have shown that base64 works to encode my data with only ~33-37% additional space. While your jsbase64 takes 100% additional memory. Can you address this?

Here is an even more simple example with only 100 bytes

cat scripts/base64.js

#!/usr/bin/env node

//
const fs = require('fs')
const base64 = require('base-64')
const jsbase64 = require('js-base64')

// The string s contains binary data. bit 8 is set on some of the bytes, see below

let s = fs.readFileSync(process.argv[2], 'binary')

const s_jsbase64 = jsbase64.encode(s)
const s_base64 = base64.encode(s)

console.log('s         ', s.length)
console.log('s_jsbase64', s_jsbase64.length)
console.log('s_base64  ', s_base64.length)

const s_base64_back = base64.decode(s_base64)
fs.writeFileSync('base64_' + process.argv[2], s_base64_back, 'binary')

const s_jsbase64_back = jsbase64.decode(s_jsbase64)
fs.writeFileSync('jsbase64_' + process.argv[2], s_jsbase64_back, 'binary')

#
# create a file of 100 bytes of random data, it contains 8 bit bytes. I consider this a binary file.
#

$ dd if=/dev/urandom of=100b bs=100 count=1
1+0 records in
1+0 records out
100 bytes copied, 0.000135 s, 741 kB/s

$ ls -l 100b 
-rw-r--r-- 1 mkm staff 100 Dec  8 00:02 100b

#
# you can see the bytes, some are 8 bit set bytes;
#

$ od -c 100b 
0000000   ?   ?   ?   n   '   ?   l   Y   >   _   ?   ? 210 223   ? 035
0000020   ?   " 230   ;   ?   _   l   6   ? 025   ^   ?   ?   t   ?   3
0000040   R   ?   ?   ?   F   ?   ?   8 002  \t   ?   x   d   [   %   ?
0000060 024   O 177   ?   3   ?   ? 037   ?   5   ?   ?   ?   ? 234 036
0000100   e   ? 027   ? 027   6 021   ?   3   ?   ?   t   ?   ?   ?   ?
0000120   w 231   ? 203 211   ?   ?   ?   +   ? 200 237  \v   t   p   &
0000140   i   m   ?   I

#
# now run the script. It shows that jsbase64 take 100% additional memory;
#

$ node ~/scripts/base64.js 100b 
s          100
s_jsbase64 204
s_base64   136

$ ls -l *100b
-rw-r--r-- 1 mkm staff 100 Dec  8 00:02 100b
-rw-r--r-- 1 mkm staff 100 Dec  8 00:02 base64_100b
-rw-r--r-- 1 mkm staff 100 Dec  8 00:02 jsbase64_100b

#
# cmp compares two files. If they are different it prints a message. These 3 files are identical;

$ cmp 100b base64_100b 
$ cmp 100b jsbase64_100b 

BTW that mozilla.org page does NOT define binary data. It defines Array Buffer.

dankogai commented 1 year ago

Can you address this?

I cannot address which is already addressed. This module already has .btoa() and .fromUint8Array() for that.

You are simply using this module wrong. Case dismissed.