Closed fcharlie closed 4 years ago
This probably has the same underlying root cause as msys2/MSYS2-packages#1974. Could you call cmd /c chcp 65001
and try again?
Oh, Windows Terminal... What is the value of LANG
? And of LC_ALL
and LC_CTYPE
?
Oh, Windows Terminal... What is the value of ? And of and ?
LANG``LC_ALL``LC_CTYPE
Windows Terminal(Powershell) not set LANG
LC_ALL
and LC_CTYPE
After LANG is set, the display is normal, which seems to be a duplicate problem.
Path Option: BashOnly
This suggests that git
should not even be found when you try to call it in PowerShell. However, it is obviously found. Which makes me wonder, where it is found. In Bash, of course, LANG
is configured in /etc/profile.d/lang.sh
, which is why it works there.
I vividly remember that we tried, once upon a time, to configure LC_CTYPE=C.UTF-8
as a fall-back and that really messed up some use cases, which is why we now fall back to LC_CTYPE=C
: https://github.com/git-for-windows/git/blob/77982caf269b7ee713a76da2bcf260c34d3bf7a7/compat/mingw.c#L3372-L3373
Not sure whether there is anything we can do to help this situation better, not if we want to avoid similar breakages...
Just to make sure, could you re-test with the latest snapshot, please?
@dscho Thanks. It seems that git diff is working properly.
download: https://wingit.blob.core.windows.net/files/MinGit-prerelease-2.28.0.windows.1-51-g87c86a26de-20200923135803-64-bit.zip (missing less)
https://wingit.blob.core.windows.net/files/PortableGit-prerelease-2.28.0.windows.1-51-g87c86a26de-20200923135803-64-bit.7z.exe Not working properly
Could you push that cjk
repository somewhere, for easy testing?
Could you push that
cjk
repository somewhere, for easy testing?
Open Windows Terminal (PowerShell Core) :
git clone https://github.com/fcharlie/cjkdemo.git
cd cjkdemo
git diff f789f9f390c80ce6cc59b7770fe833554140d83f
Okay, I gave it a try and could verify that:
git -P diff f789f9
works, suggesting that the problem lies with the pager, andLC_ALL=C.UTF-8
, it works as expected.@fcharlie I think I have half a fix here: https://github.com/git-for-windows/git/pull/2834
I think I have half a fix here: #2834
@fcharlie I hope you have a good idea how to fix the other half because I sure don't.
@dscho I have a solution here. In git wapper, we always set LANG to **.UTF-8 according to the user's locale when LANG is not set.
I have a solution here. In git wapper, we always set LANG to **.UTF-8 according to the user's locale when LANG is not set.
Unfortunately, I do not remember the details, and failed to dig up the context, but I vividly remember that we tried that and that it wreaked havoc.
We cannot just set LANG=**.UTF-8
unless we know that the code page supports it.
Unfortunately, Windows Terminal does not set the code page to UTF-8, otherwise my fix would work without any additional work.
I have a solution here. In git wapper, we always set LANG to **.UTF-8 according to the user's locale when LANG is not set.
Unfortunately, I do not remember the details, and failed to dig up the context, but I vividly remember that we tried that and that it wreaked havoc.
We cannot just set unless we know that the code page supports it.
LANG=**.UTF-8
Unfortunately, Windows Terminal does not set the code page to UTF-8, otherwise my fix would work without any additional work.
Maybe we should reimplement functions like fprintf in git in Windows to support git encoding completely as UTF-8, and then let all be processed in accordance with UTF-8, no longer consider the LANG setting. To be honest, I think that now to evaluate CodePage, this thing is completely dross.
File not tty: fprintf -- UTF-8 --> WriteFile File is console: fprintf -- UTF-8 to UTF-16 --> WriteConsoleW
I saw that git has a similar mechanism, but it feels a bit confusing.
void winansi_init(void)
{
int con1, con2;
wchar_t name[32];
/* check if either stdout or stderr is a console output screen buffer */
con1 = is_console(1);
con2 = is_console(2);
/* Also compute console bit for fd 0 even though we don't need the result here. */
is_console(0);
if (!con1 && !con2) {
#ifdef DETECT_MSYS_TTY
/* check if stdin / stdout / stderr are MSYS2 pty pipes */
detect_msys_tty(0);
detect_msys_tty(1);
detect_msys_tty(2);
#endif
return;
}
/* create a named pipe to communicate with the console thread */
if (swprintf(name, ARRAY_SIZE(name) - 1, L"\\\\.\\pipe\\winansi%lu",
GetCurrentProcessId()) < 0)
die("Could not initialize winansi pipe name");
hwrite = CreateNamedPipeW(name, PIPE_ACCESS_OUTBOUND,
PIPE_TYPE_BYTE | PIPE_WAIT, 1, BUFFER_SIZE, 0, 0, NULL);
if (hwrite == INVALID_HANDLE_VALUE)
die_lasterr("CreateNamedPipe failed");
hread = CreateFileW(name, GENERIC_READ, 0, NULL, OPEN_EXISTING, 0, NULL);
if (hread == INVALID_HANDLE_VALUE)
die_lasterr("CreateFile for named pipe failed");
/* start console spool thread on the pipe's read end */
hthread = CreateThread(NULL, 0, console_thread, NULL, 0, NULL);
if (hthread == INVALID_HANDLE_VALUE)
die_lasterr("CreateThread(console_thread) failed");
/* schedule cleanup routine */
if (atexit(winansi_exit))
die_errno("atexit(winansi_exit) failed");
/* redirect stdout / stderr to the pipe */
if (con1)
hconsole1 = swap_osfhnd(1, duplicate_handle(hwrite));
if (con2)
hconsole2 = swap_osfhnd(2, duplicate_handle(hwrite));
}
Implementation of golang:
// Write implements io.Writer.
func (fd *FD) Write(buf []byte) (int, error) {
if err := fd.writeLock(); err != nil {
return 0, err
}
defer fd.writeUnlock()
if fd.isFile {
fd.l.Lock()
defer fd.l.Unlock()
}
ntotal := 0
for len(buf) > 0 {
b := buf
if len(b) > maxRW {
b = b[:maxRW]
}
var n int
var err error
if fd.isFile {
switch fd.kind {
case kindConsole:
n, err = fd.writeConsole(b)
default:
n, err = syscall.Write(fd.Sysfd, b)
if fd.kind == kindPipe && err == syscall.ERROR_OPERATION_ABORTED {
// Close uses CancelIoEx to interrupt concurrent I/O for pipes.
// If the fd is a pipe and the Write was interrupted by CancelIoEx,
// we assume it is interrupted by Close.
err = ErrFileClosing
}
}
if err != nil {
n = 0
}
} else {
if race.Enabled {
race.ReleaseMerge(unsafe.Pointer(&ioSync))
}
o := &fd.wop
o.InitBuf(b)
n, err = execIO(o, func(o *operation) error {
return syscall.WSASend(o.fd.Sysfd, &o.buf, 1, &o.qty, 0, &o.o, nil)
})
}
ntotal += n
if err != nil {
return ntotal, err
}
buf = buf[n:]
}
return ntotal, nil
}
// writeConsole writes len(b) bytes to the console File.
// It returns the number of bytes written and an error, if any.
func (fd *FD) writeConsole(b []byte) (int, error) {
n := len(b)
runes := make([]rune, 0, 256)
if len(fd.lastbits) > 0 {
b = append(fd.lastbits, b...)
fd.lastbits = nil
}
for len(b) >= utf8.UTFMax || utf8.FullRune(b) {
r, l := utf8.DecodeRune(b)
runes = append(runes, r)
b = b[l:]
}
if len(b) > 0 {
fd.lastbits = make([]byte, len(b))
copy(fd.lastbits, b)
}
// syscall.WriteConsole seems to fail, if given large buffer.
// So limit the buffer to 16000 characters. This number was
// discovered by experimenting with syscall.WriteConsole.
const maxWrite = 16000
for len(runes) > 0 {
m := len(runes)
if m > maxWrite {
m = maxWrite
}
chunk := runes[:m]
runes = runes[m:]
uint16s := utf16.Encode(chunk)
for len(uint16s) > 0 {
var written uint32
err := syscall.WriteConsole(fd.Sysfd, &uint16s[0], uint32(len(uint16s)), &written, nil)
if err != nil {
return 0, err
}
uint16s = uint16s[written:]
}
}
return n, nil
}
We already write bytes to files as-are, without re-encoding. The winansi code you found targets the Win32 Console and needs to re-encode (in particular ANSI control sequences).
That still does not answer the question how we could discern the case where Windows Terminal pretends not to handle Unicode but actually does handle Unicode.
@dscho Windows Terminal is no different from the old Console, except that it can handle ANSI better. Therefore, Git CMD will also have this problem. In addition, the internal behavior of git is not consistent. For example, git commit can be displayed correctly, but git log/git diff cannot
I use the blobview program written by golang to check the commit, but it is actually UTF-8
So there may be two problems here. The first is the inconsistency of git commit/git diff/git log processing, and the second is my previous reply. Git follows the old locale setting. The complexity of this setting makes it easy to appear problem.
Of course, I think this is a universal problem with a lot of historical baggage, and it is good to be able to let go of these baggage.
git commit can be displayed correctly, but git log/git diff cannot
That's because git commit
does not use the pager, but git log
/git diff
do. And that pager is less.exe
provided by the MSYS2 project, which really needs to know the output encoding (because it does not take git.exe
's shortcuts that may or may not be appropriate).
git commit can be displayed correctly, but git log/git diff cannot
That's because does not use the pager, but / do. And that pager is provided by the MSYS2 project, which really needs to know the output encoding (because it does not take 's shortcuts that may or may not be appropriate).
git commit``git log``git diff``less.exe``git.exe
So can we reimplement less?
So can we reimplement less?
Oh my. If you want to. I'll not get into this, though...
So can we reimplement less?
Oh my. If you want to. I'll not get into this, though...
Or patch less?
Or patch less?
Or patch the MSYS2 runtime.
But how? That is the big question.
Because you can get less.exe
to work correctly, by setting LC_CTYPE
(or LANG
) in a specific way. As my PR demonstrates, we can do that even in git.exe
. But the biggest problem remains: how to discern the situations when it is appropriate (when the code page is UTF-8) vs when it is not appropriate (IIRC there was a problem when the code page was set to something non-ASCII, non-UTF-8).
IIRC there was a problem when the code page was set to something non-ASCII, non-UTF-8
I think I found the original report: https://groups.google.com/g/git-for-windows/c/9_abrdlqi48/m/S9pJE56ZDgAJ
However, I just re-tested with ConEmu and cannot replicate that behavior. Therefore, I will go ahead and just replace the "C"
by "C.UTF-8"
and hope that it does not break anyone's scenario.
@fcharlie I went ahead and updated https://github.com/git-for-windows/git/pull/2834. When you get a chance, please test this:
sdk cd git
,git pull https://github.com/dscho/git/ lctype-should-adapt-to-console-code-page
make -j$(nproc)
git.exe
into a portable Git and test@fcharlie I went ahead and updated #2834. When you get a chance, please test this:
- install Git for Windows' SDK,
sdk cd git
,git pull https://github.com/dscho/git/ lctype-should-adapt-to-console-code-page
- build Git via
make -j$(nproc)
- copy
git.exe
into a portable Git and test
Good I will try it out and give you feedback, thank you.
@dscho After testing, your patch is effective. Thanks.
Very good!
Setup
insert your response here
Details
Windows Terminal
The CJK string is displayed correctly as in Mintty.
Windows Terminal snapshot:
Git CMD snapshot: