dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.27k stars 4.73k forks source link

Various non-control characters are interpreted as control characters by the console. #80644

Open ericwj opened 1 year ago

ericwj commented 1 year ago

Description

The following character is erroneously interpreted as the escape character U+001B:

← Leftwards Arrow U+2190

Hence instead of seeing a left arrow in the console as expected, I can write ANSI escapes only using this non-control-character. But the obvious downside is that I cannot show the left arrow character in the console.

More generally

Thus instead of printing the IBM-437 graphemes for the Unicode code points listed below, these code points are interpreted as the C0 control characters 1-31.

By typing Alt+1..Alt+31 I am able to produce the IBM-437 graphemes used in DOS for control characters if you would write the bytes 1-31 to the video memory directly: ☺☻♥♦♣♠•◘○◙♂♀♪♫☼►◄↕‼¶§▬↨↑↓→←∟↔▲▼

Which GitHub apparently is able to show very colorfully:

☺☻♥♦♣♠•◘○◙♂♀♪♫☼►◄↕‼¶§▬↨↑↓→←∟↔▲▼

These are however the following Unicode code points (from 1 to 31):

       \u263a \u263b \u2665 \u2666 \u2663 \u2660 \u2022 \u25d8 \u25cb \u25d9 \u2642 \u2640 \u266a \u266b \u263c
\u25ba \u25c4 \u2195 \u203c \u00b6 \u00a7 \u25ac \u21a8 \u2191 \u2193 \u2192 \u2190 \u221f \u2194 \u25b2 \u25bc

Those between Alt+7 and Alt+15 and Alt+27 don't show correctly.

These correspond to \a\b\t\n\v\f\r (\u0007-\u000d) and escape (\u001b). The other two are \u000e (SI Shift In) and \u000f (SO Shift Out).

Reproduction Steps

Console.Title = "ANSI but not ANSI";
var s = "\u2190";
var m = $"\\u{(int)s[0]:x4}: {s}[9mSTRIKETHROUGH???{s}[29m";
var h = (int)m[8];
Console.WriteLine($"0x{h:x4}");
Console.WriteLine(m);
s = "☺☻♥♦♣♠•◘○◙♂♀♪♫☼►◄↕‼¶§▬↨↑↓→←∟↔▲▼";
for (var i = 0; i < s.Length; i++)
    Console.WriteLine($"{i + 1:x2}: \\u{(int)s[i]:x4} {s[i]}←[0m");

Actual behavior

image image

Expected behavior

PS C:\>
>> $s = [string][char]0x2190
>> $m = "\u{0:x4}: {1}[9mSTRIKETHROUGH???{1}[29m" -f @([int]$s[0], $s)
>> $h = [int]$m[8]
>> Write-Host ("0x{0:x4}" -f $h)
>> Write-Host $m
0x2190
\u2190: ←[9mSTRIKETHROUGH???←[29m
PS C:\> Write-Host "☺☻♥♦♣♠•◘○◙♂♀♪♫☼►◄↕‼¶§▬↨↑↓→←∟↔▲▼"
☺☻♥♦♣♠•◘○◙♂♀♪♫☼►◄↕‼¶§▬↨↑↓→←∟↔▲▼
PS C:\> [System.IO.File]::WriteAllText("C:\test.txt", "☺☻♥♦♣♠•◘○◙♂♀♪♫☼►◄↕‼¶§▬↨↑↓→←∟↔▲▼", [System.Text.Encoding]::Unicode)
image

Regression?

I have seen this a few years before when source generators were new and on my old computer with Windows 10 at that time.

Known Workarounds

Nope, I think I am stuck with just ↑↓→ as far as those arrows are concerned.

Configuration

Windows Terminal 1.15.3466.0 Windows 11 10.0.22621.1105

PSVersion 7.3.1

.NET SDK:
 Version:   7.0.101
 Commit:    bb24aafa11

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.22621
 OS Platform: Windows
 RID:         win10-x64
 Base Path:   C:\Program Files\dotnet\sdk\7.0.101\

Host:
  Version:      7.0.1
  Architecture: x64
  Commit:       97203d38ba

Other information

Could argue it is translating the Unicode to a single-byte code page, but then I seriously don't get why I get ANSI VT processing without doing anything. If anyone would have called the appropriate API's to enable VT processing by default, why hadn't it also enabling Unicode? AFAIK both are compat breaking changes for old console apps.

dotnet-issue-labeler[bot] commented 1 year ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 1 year ago

Tagging subscribers to this area: @dotnet/area-system-console See info in area-owners.md if you want to be subscribed.

Issue Details
### Description The following character is erroneously interpreted as the escape character U+001B: ← Leftwards Arrow U+2190 Hence instead of seeing a left arrow in the console as expected, I can write ANSI escapes only using this non-control-character. But the obvious downside is that I cannot show the left arrow character in the console. #### More generally Thus instead of printing the IBM-437 graphemes for the Unicode code points listed below, these code points are interpreted as the C0 control characters 1-31. By typing `Alt+1`..`Alt+31` I am able to produce the IBM-437 graphemes used in DOS for control characters if you would write the bytes 1-31 to the video memory directly: `☺☻♥♦♣♠•◘○◙♂♀♪♫☼►◄↕‼¶§▬↨↑↓→←∟↔▲▼ ` Which GitHub apparently is able to show very colorfully: ☺☻♥♦♣♠•◘○◙♂♀♪♫☼►◄↕‼¶§▬↨↑↓→←∟↔▲▼ These are however the following Unicode code points (from 1 to 31): ``` \u263a \u263b \u2665 \u2666 \u2663 \u2660 \u2022 \u25d8 \u25cb \u25d9 \u2642 \u2640 \u266a \u266b \u263c \u25ba \u25c4 \u2195 \u203c \u00b6 \u00a7 \u25ac \u21a8 \u2191 \u2193 \u2192 \u2190 \u221f \u2194 \u25b2 \u25bc ``` Those between `Alt+7` and `Alt+15` and `Alt+27` don't show correctly. These correspond to `\a\b\t\n\v\f\r` (`\u0007`-`\u000d`) and escape (`\u001b`). The other two are `\u000e` (`SI` Shift In) and `\u000f` (`SO` Shift Out). ### Reproduction Steps ```C# Console.Title = "ANSI but not ANSI"; var s = "\u2190"; var m = $"\\u{(int)s[0]:x4}: {s}[9mSTRIKETHROUGH???{s}[29m"; var h = (int)m[8]; Console.WriteLine($"0x{h:x4}"); Console.WriteLine(m); s = "☺☻♥♦♣♠•◘○◙♂♀♪♫☼►◄↕‼¶§▬↨↑↓→←∟↔▲▼"; for (var i = 0; i < s.Length; i++) Console.WriteLine($"{i + 1:x2}: \\u{(int)s[i]:x4} {s[i]}←[0m"); ``` ### Actual behavior image image ### Expected behavior ```PS PS C:\> >> $s = [string][char]0x2190 >> $m = "\u{0:x4}: {1}[9mSTRIKETHROUGH???{1}[29m" -f @([int]$s[0], $s) >> $h = [int]$m[8] >> Write-Host ("0x{0:x4}" -f $h) >> Write-Host $m 0x2190 \u2190: ←[9mSTRIKETHROUGH???←[29m PS C:\> Write-Host "☺☻♥♦♣♠•◘○◙♂♀♪♫☼►◄↕‼¶§▬↨↑↓→←∟↔▲▼" ☺☻♥♦♣♠•◘○◙♂♀♪♫☼►◄↕‼¶§▬↨↑↓→←∟↔▲▼ PS C:\> [System.IO.File]::WriteAllText("C:\test.txt", "☺☻♥♦♣♠•◘○◙♂♀♪♫☼►◄↕‼¶§▬↨↑↓→←∟↔▲▼", [System.Text.Encoding]::Unicode) ``` image ### Regression? I have seen this a few years before when source generators were new and on my old computer with Windows 10 at that time. ### Known Workarounds Nope, I think I am stuck with just ↑↓→ as far as those arrows are concerned. ### Configuration Windows Terminal 1.15.3466.0 Windows 11 10.0.22621.1105 ``` PSVersion 7.3.1 .NET SDK: Version: 7.0.101 Commit: bb24aafa11 Runtime Environment: OS Name: Windows OS Version: 10.0.22621 OS Platform: Windows RID: win10-x64 Base Path: C:\Program Files\dotnet\sdk\7.0.101\ Host: Version: 7.0.1 Architecture: x64 Commit: 97203d38ba ``` ### Other information Could argue it is translating the Unicode to a single-byte code page, but then I seriously don't get why I get ANSI VT processing without doing anything. If anyone would have called the appropriate API's to enable VT processing by default, why hadn't it also enabling Unicode? AFAIK both are compat breaking changes for old console apps.
Author: ericwj
Assignees: -
Labels: `area-System.Console`, `untriaged`
Milestone: -
davidly commented 7 months ago

I really hope this gets fixed. The current behavior is broken.

jcrben commented 6 months ago

Maybe this could fix https://github.com/PowerShell/PSReadLine/issues/105 ?

I've got a Dell Latitude 5420 around here which prints nothing if when I run [Console]::ReadKey() and then hit alt+left or alt+right