denoland / deno

A modern runtime for JavaScript and TypeScript.
https://deno.com
MIT License
98.08k stars 5.4k forks source link

Bug: `Buffer` fails to encode ascii string to base64 #24908

Open marvinhagemeister opened 3 months ago

marvinhagemeister commented 3 months ago

It seems like a special crafted string can causes an error to be thrown when creating a Node Buffer and encode with base64.

Steps to reproduce

Run this file:

import { Buffer } from "node:buffer";
Buffer.from("base64-encoded-bytes-from-browser", "base64");

Output (Deno 1.45.5):

error: Uncaught (in promise) InvalidCharacterError: Failed to decode base64
Buffer.from("base64-encoded-bytes-from", "base64");
       ^
    at base64Write (ext:deno_node/internal_binding/_utils.ts:30:12)
    at Buffer.base64Write_ [as base64Write] (ext:deno_node/internal/buffer.mjs:726:10)
    at Object.write (ext:deno_node/internal/buffer.mjs:2177:46)
    at fromStringFast (ext:deno_node/internal/buffer.mjs:255:22)
    at fromString (ext:deno_node/internal/buffer.mjs:276:10)
    at _from (ext:deno_node/internal/buffer.mjs:160:12)
    at Function.from (ext:deno_node/internal/buffer.mjs:198:10)
    at file:///Users/marvinh/dev/test/deno-buffer/foo.mjs:2:8

Output (Deno git 86e219c50):

error: Uncaught (in promise) InvalidCharacterError: Failed to decode base64
Buffer.from("base64-encoded-bytes-from-browser", "base64");
       ^
    at forgivingBase64Decode (ext:deno_web/00_infra.js:257:10)
    at base64ToBytes (ext:deno_node/internal_binding/_utils.ts:26:12)
    at Uint8Array.base64Write (ext:deno_node/internal/buffer.mjs:672:21)
    at Object.write (ext:deno_node/internal/buffer.mjs:2123:46)
    at Uint8Array.write (ext:deno_node/internal/buffer.mjs:794:14)
    at fromString (ext:deno_node/internal/buffer.mjs:216:22)
    at _from (ext:deno_node/internal/buffer.mjs:124:12)
    at Function.from (ext:deno_node/internal/buffer.mjs:162:10)
    at file:///Users/marvinh/dev/test/deno-buffer/foo.mjs:2:8

Version: Deno 1.45.5

marvinhagemeister commented 3 months ago

Seems like it's failing on the Rust side here https://github.com/denoland/deno/blob/86e219c509d065a140b7cd9758dcf1bee1e78db2/ext/web/lib.rs#L172-L174

devsnek commented 3 months ago

should it be STANDARD_NO_PAD?

lucacasonato commented 3 months ago

@marvinhagemeister that linked code is for encoding not decoding. Is base64-encoded-bytes-from-browser valid base64?

littledivy commented 3 months ago

Maybe caused by https://github.com/denoland/deno/pull/24346

marvinhagemeister commented 3 months ago

Seems like this relates to the length of the string. This is another repro:

import { Buffer } from "node:buffer";

for (let i = 0; i < 50; i++) {
  try {
    const str = "a".repeat(i);
    Buffer.from(str, "base64");
  } catch (_) {
    console.log(`Failed with ${i} chars`);
  }
}

Output:

Failed with 5 chars
Failed with 9 chars
Failed with 13 chars
Failed with 17 chars
Failed with 21 chars
Failed with 25 chars
Failed with 29 chars
Failed with 33 chars
Failed with 37 chars
Failed with 41 chars
Failed with 45 chars
Failed with 49 chars
lucacasonato commented 3 months ago

@marvinhagemeister It seems node is just very loosy goosey about it's base64 decoding. It will happily decode this string for example, even though it's definitely not valid: "abc123!@#$%^&*(). And the reason it's related to the length, is that 5 char base64 is just not valid at all - node just seems to ignore this fact and just ignore the 5th char entirely:

> Buffer.from("abcde", "base64").toString("base64");
'abcd'
marvinhagemeister commented 3 months ago

@lucacasonato good find. It seems to sometimes cut off the string and sometimes pad it.

input:  
result: 

input:  a
result: 

input:  aa
result: aQ==

input:  aaa
result: aaY=

input:  aaaa
result: aaaa

input:  aaaaa
result: aaaa

input:  aaaaaa
result: aaaaaQ==

input:  aaaaaaa
result: aaaaaaY=

input:  aaaaaaaa
result: aaaaaaaa

input:  aaaaaaaaa
result: aaaaaaaa

input:  aaaaaaaaaa
result: aaaaaaaaaQ==

input:  aaaaaaaaaaa
result: aaaaaaaaaaY=

input:  aaaaaaaaaaaa
result: aaaaaaaaaaaa

input:  aaaaaaaaaaaaa
result: aaaaaaaaaaaa

input:  aaaaaaaaaaaaaa
result: aaaaaaaaaaaaaQ==

input:  aaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaY=

input:  aaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaa

input:  aaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaa

input:  aaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaQ==

input:  aaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaY=

input:  aaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaa

input:  aaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaa

input:  aaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaQ==

input:  aaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaY=

input:  aaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaa

input:  aaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaa

input:  aaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaQ==

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaY=

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaa

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaa

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaQ==

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaY=

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaQ==

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaY=

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaQ==

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaY=

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaQ==

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaY=

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaQ==

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaY=

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

input:  aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
result: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
bartlomieju commented 3 months ago

@marvinhagemeister is there a particular package that fails to work because of this behavior? The conclusion here was that Node.js behavior seems strange/incorrect because invalid base64 chars are ignored. That said if some package rely on this behavior we do need to mimic it.

marvinhagemeister commented 3 months ago

@bartlomieju Ask @thisisjofrank for her LED matrix repo. That's where I encountered it in the test suite. I don't have the link at hand at the moment.