Closed 1MikeMakuch closed 1 year ago
For very small data Base64 can even quadruple the data. e.g.
Base64.encode('A') === 'QQ=='
Which is NOT naive. It is just official.
50k encodes to 100k, consistently. Why?
Would you show me an example? Here is counterexample Which 6 bytes of original data encodes to 8-byte base64.
> var str = "Base64"
undefined
> str.length
6
> Base64.encode(str)
'QmFzZTY0'
> Base64.encode(str).length
8
Happy to. I've been using it to encode small images, so much larger than your example. But it appears to be dependent upon the data too, unlike the base64 module which is consistently 33% larger. For example, I first tried creating a string of 50k of the letter 'a'. And jsbase64 was the same length as base64. But with random binary data, like in images jsbase64 is double.
Here is a simple example showing it;
$ cat scripts/base64.js
#!/usr/bin/env node
//
const fs = require('fs')
const base64 = require('base-64')
const jsbase64 = require('js-base64')
let s = fs.readFileSync(process.argv[2], 'binary')
const s_jsbase64 = jsbase64.encode(s)
const s_base64 = base64.encode(s)
console.log('s ', s.length)
console.log('s_jsbase64', s_jsbase64.length)
console.log('s_base64 ', s_base64.length)
$ dd if=/dev/urandom of=1MB bs=1MB count=1
1+0 records in
1+0 records out
1000000 bytes (1.0 MB, 977 KiB) copied, 0.002143 s, 467 MB/s
$
$ ls -l 1MB
-rw-r--r-- 1 mkm staff 1000000 Dec 7 20:25 1MB
$ node scripts/base64.js 1MB
s 1000000
s_jsbase64 2000016
s_base64 1333336
Hope that helps. Cheers Mike
I think I got it. Base64.encode
treats input as a string
which is NOT binary. See
https://github.com/dankogai/js-base64#decode-vs-atob-and-encode-vs-btoa
Use Base64.btoa()
or Base64. fromUint8Array()
I don't understand what you're suggesting.. are you suggesting that base64.encode() does not work on binary data?
Both work. I encode and then decode and end up with the original binary file.
$ cat scripts/base64.js
#!/usr/bin/env node
//
const fs = require('fs')
const base64 = require('base-64')
const jsbase64 = require('js-base64')
let s = fs.readFileSync(process.argv[2], 'binary')
const s_jsbase64 = jsbase64.encode(s)
const s_base64 = base64.encode(s)
console.log('s ', s.length)
console.log('s_jsbase64', s_jsbase64.length)
console.log('s_base64 ', s_base64.length)
const s_base64_back = base64.decode(s_base64)
fs.writeFileSync('base64_' + process.argv[2], s_base64_back, 'binary')
const s_jsbase64_back = jsbase64.decode(s_jsbase64)
fs.writeFileSync('jsbase64_' + process.argv[2], s_jsbase64_back, 'binary')
$ node ~/scripts/base64.js helloWorld-10MB.png
s 10000000
s_jsbase64 20004252
s_base64 13333336
$ ls -l *helloWorld-10MB.png
-rw-r--r-- 1 mkm staff 10000000 Dec 7 23:09 base64_helloWorld-10MB.png
-rw-r--r-- 1 mkm staff 10000000 Dec 7 23:09 helloWorld-10MB.png
-rw-r--r-- 1 mkm staff 10000000 Dec 7 23:09 jsbase64_helloWorld-10MB.png
$ cmp helloWorld-10MB.png base64_helloWorld-10MB.png
$ cmp helloWorld-10MB.png jsbase64_helloWorld-10MB.png
I noticed that in your code you reference the wiki page on base64, which clearly states that base64 needs 33% more memory. But I've shown you that your jsbase64 uses approximately double memory in my example above. Can you speak to this issue?
don't understand what you're suggesting.. are you suggesting that base64.encode() does not work on binary data?
No it does not. It is for UTF-8 string and string
is not binary. Base64.encode()
was born before Uint8Array
. There were only atob()
and btoa()
which could not handle UTF-8 correctly.
And the reason Base64.encode()
emits more bytes than Base64.atob()
is that it encodes string
to binary before it encodes it to Base64.
Please define what you mean by "binary".
On Wed, Dec 7, 2022, 11:29 PM Dan Kogai @.***> wrote:
don't understand what you're suggesting.. are you suggesting that base64.encode() does not work on binary data?
No it does not. It is for UTF-8 string and string is not binary. Base64.encode() was born before Uint8Array. There were only atob() and btoa() which could not handle UTF-8 correctly.
And the reason Base64.encode() emits more bytes than Base64.atob() is that it encodes string to binary before it encodes it to Base64.
— Reply to this email directly, view it on GitHub https://github.com/dankogai/js-base64/issues/160#issuecomment-1342071767, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2XGZRCM5EIPSQULONZHHTWMFW5LANCNFSM6AAAAAASXOJU6Y . You are receiving this because you authored the thread.Message ID: @.***>
Exactly as ECMAScript defines binary. If you do not understand what I am talking about, further study the history of JS. Buffer
of node.js may also help.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer https://nodejs.org/api/buffer.html#buffers-and-typedarrays
I've been programming long before Javascript was invented sir. The term "binary" data is more general than that definition. It simply means bytes with the 8th bit set on, in general. In any case, that is all I meant when I said "binary" data. But we digress. The issue I am posting about is not about the definition of binary data.
I have shown that base64 works to encode my data with only ~33-37% additional space. While your jsbase64 takes 100% additional memory. Can you address this?
Here is an even more simple example with only 100 bytes
cat scripts/base64.js
#!/usr/bin/env node
//
const fs = require('fs')
const base64 = require('base-64')
const jsbase64 = require('js-base64')
// The string s contains binary data. bit 8 is set on some of the bytes, see below
let s = fs.readFileSync(process.argv[2], 'binary')
const s_jsbase64 = jsbase64.encode(s)
const s_base64 = base64.encode(s)
console.log('s ', s.length)
console.log('s_jsbase64', s_jsbase64.length)
console.log('s_base64 ', s_base64.length)
const s_base64_back = base64.decode(s_base64)
fs.writeFileSync('base64_' + process.argv[2], s_base64_back, 'binary')
const s_jsbase64_back = jsbase64.decode(s_jsbase64)
fs.writeFileSync('jsbase64_' + process.argv[2], s_jsbase64_back, 'binary')
#
# create a file of 100 bytes of random data, it contains 8 bit bytes. I consider this a binary file.
#
$ dd if=/dev/urandom of=100b bs=100 count=1
1+0 records in
1+0 records out
100 bytes copied, 0.000135 s, 741 kB/s
$ ls -l 100b
-rw-r--r-- 1 mkm staff 100 Dec 8 00:02 100b
#
# you can see the bytes, some are 8 bit set bytes;
#
$ od -c 100b
0000000 ? ? ? n ' ? l Y > _ ? ? 210 223 ? 035
0000020 ? " 230 ; ? _ l 6 ? 025 ^ ? ? t ? 3
0000040 R ? ? ? F ? ? 8 002 \t ? x d [ % ?
0000060 024 O 177 ? 3 ? ? 037 ? 5 ? ? ? ? 234 036
0000100 e ? 027 ? 027 6 021 ? 3 ? ? t ? ? ? ?
0000120 w 231 ? 203 211 ? ? ? + ? 200 237 \v t p &
0000140 i m ? I
#
# now run the script. It shows that jsbase64 take 100% additional memory;
#
$ node ~/scripts/base64.js 100b
s 100
s_jsbase64 204
s_base64 136
$ ls -l *100b
-rw-r--r-- 1 mkm staff 100 Dec 8 00:02 100b
-rw-r--r-- 1 mkm staff 100 Dec 8 00:02 base64_100b
-rw-r--r-- 1 mkm staff 100 Dec 8 00:02 jsbase64_100b
#
# cmp compares two files. If they are different it prints a message. These 3 files are identical;
$ cmp 100b base64_100b
$ cmp 100b jsbase64_100b
BTW that mozilla.org page does NOT define binary data. It defines Array Buffer.
Can you address this?
I cannot address which is already addressed. This module already has .btoa()
and .fromUint8Array()
for that.
You are simply using this module wrong. Case dismissed.
base64 is supposed to increase data size by 33% - 37%, https://en.wikipedia.org/wiki/Base64
This version doubles the data size! Apparently it is a naive version of base64.