denoland / deno

A modern runtime for JavaScript and TypeScript.
https://deno.com
MIT License
93.95k stars 5.23k forks source link

`stdin.read` (and `stdin.readSync`) corrupt non-ASCII input on Windows #18240

Open lionel-rowe opened 1 year ago

lionel-rowe commented 1 year ago

stdin.read (and stdin.readSync) corrupt non-ASCII input on Windows.

To reproduce:

const c = new Uint8Array(6);
Deno.stdin.read(c).then(() => console.log(c));

Then, enter a non-ASCII character. The resulting bytes will be corrupted on Windows.

Examples, with trailing LF/CRLF/null bytes truncated:

Input Expected Actual Decoded as UTF-8
ÿ [195, 191] [152] "�" (Invalid char)
Ā [196, 128] [65] "A"
ā [196, 129] [97] "a"
[229, 149, 138] [63] "?"
🦄 [240, 159, 166, 132] [63, 63] "??"

Expected results are the UTF-8 bytes. Results on Linux are as expected.

njhanley commented 1 year ago

This stems from Deno reading directly from console input as a file, which uses the console's current code page rather than Unicode (see High-Level Console I/O). In the example, 'ÿ' maps to 152 in code page 437 (OEM-US).

Instead ReadConsoleW should be used, as in Rust's std::io::Stdin.

@dsherret Is there a reason Deno doesn't use Rust/Tokio's stdio implementation? If not, would a PR be welcome?

dsherret commented 1 year ago

Yes, a PR would be welcome. I believe it should use std::io::stdin here when StdFileResourceKind::Stdin:

https://github.com/denoland/deno/blob/3cd7abf73fa104526508984daef54bbb8e120310/ext/io/lib.rs#L398-L400

Similar to how it does this for write:

https://github.com/denoland/deno/blob/3cd7abf73fa104526508984daef54bbb8e120310/ext/io/lib.rs#L377-L383

Mqxx commented 1 month ago

Hey, I just stumbled across this issue having the same problem. Characters like ÄÖÜ (non ASCII) are corrupted. Any update on when this gets fixed?

Thanks