git-for-windows / git

A fork of Git containing Windows-specific patches.
http://gitforwindows.org/
Other
8.18k stars 2.49k forks source link

The output of file listing won't do iconv transforming automatically so that there will be garbled code in output of `git status/list-files/add -i`. #5004

Closed GitPopcorn closed 1 week ago

GitPopcorn commented 3 weeks ago

Setup

$ git --version --build-options

git version 2.45.0.windows.1
cpu: x86_64
built from commit: b5d0511969ccd9ab86395c37e5a7619d8b4e7c32
sizeof-long: 4
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
$ cmd.exe /c ver

Microsoft Windows [版本 10.0.22631.3593]
# One of the following:
> type "C:\Program Files\Git\etc\install-options.txt"
> type "C:\Program Files (x86)\Git\etc\install-options.txt"
> type "%USERPROFILE%\AppData\Local\Programs\Git\etc\install-options.txt"
> type "$env:USERPROFILE\AppData\Local\Programs\Git\etc\install-options.txt"
$ cat /etc/install-options.txt

Editor Option: VisualStudioCode
Custom Editor Path:
Default Branch Option:
Path Option: Cmd
SSH Option: OpenSSH
Tortoise Option: false
CURL Option: OpenSSL
CRLF Option: CRLFAlways
Bash Terminal Option: MinTTY
Git Pull Behavior Option: Merge
Use Credential Manager: Enabled
Performance Tweaks FSCache: Enabled
Enable Symlinks: Disabled
Enable Pseudo Console Support: Disabled
Enable FSMonitor: Disabled

All the related environment variables or git configurations about character set was set to UTF-8, as follows:

set LANG=zh_CN.UTF-8
set LESSCHARSET=utf-8
git config --global i18n.logoutputencoding utf-8
git config --global i18n.commitencoding utf-8
git config --global core.quotepath false

Details

Which terminal/shell are you running Git from? e.g Bash/CMD/PowerShell/other

CMD, with CHCP 936(decoding with GBK)

What commands did you run to trigger this issue? If you can provide a

Minimal, Complete, and Verifiable example this will help us understand the issue.

git status/git list-files/git add -i

What did you expect to occur after running these commands?

The filenames with CJK character be print normally as printed in the output of commands like git log, git diff.

What actually happened instead?

The filenames with CJK character will be encoded by Git as UTF-8 character and then decoded by CMD as GBK character, so it will finally be rendered as many garbled code (just wrong encoding result like '闂淇', not the escaped unicode like '\u95ee\u9898\u4fee\u590d', the original text is '问题修复'). While at the same time, all the same filenames will be print normally in the output of commands like git log, git diff. So I don't think it's something wrong caused by the configurations.

Any other details?

Why I think it was caused by the lack of encoding conversion?

  1. Because the commands like git log, git diff just worked well while all my configurations and environment variables about character set are set to 'UTF-8'. There is no way for Git to print CJK character normally in a CMD decoding with GBK without additional converting operation.
  2. In my mind this issue was not appeared in all the version of Git for Windows, it only happened after one upgrading, but I am sorry that I can not remember the exact version, I am trying to downgrade to a correct old version too.
  3. I found a dynamic link library file libiconv-2.dll under %GIT_HOME%\mingw64\bin, seems to be used in encoding transforming.
  4. After I found the general cause of problem, I do some test with independent iconv command and found interesting results: 4.1. git status | iconv -f UTF-8 -t GBK: The output back to normal, but lost the color in terminal. 4.2. git status | iconv -f UTF-8 -t UTF-8: The output was not correct. 4.3. git config --global alias.st2 "!f(){ git status | iconv -f UTF-8 -t GBK; };f" && git st2: The output was not correct and shows another type of garbled code. 4.4. git config --global alias.st2 "!f(){ git status | iconv -f UTF-8 -t UTF-8; };f" && git st2: The output back to normal, but lost the color in terminal. 4.5. git config --global alias.st2 "!f(){ git status | grep \".*\"; };f" && git st2: The output back to normal, but lost the color in terminal. 4.6. What cause those difference between native command and alias? I think the output of alias with shell command is not original bytes anymore because it need to be run in a shell environment, but git will detected the output environment and transform the plaintext printed by shell command to matched encoded bytes automatically, so we will always see correct output only if we run command in a alias function with pipeline handling. But not the native git status command does so because it directly send bytes that has been already encoded with 'UTF-8' character set to the CMD, and this character set using to encode could not be changed by any known configuration of Git.

If the problem was occurring with a specific repository, can you provide the URL to that repository to help us with testing?

This issue is common in any CMD window running with CHCP 936 and any repositories that contains file with CJK characters in their names, so I think we do not need a specific repository to reproduce it.

GitPopcorn commented 3 weeks ago

Well dude I found it works just fine under the version Git-2.42.0.2-64-bit which I download it on October 22, 2023, no configuration changes during the reinstallation. So now I'm pretty sure that there must have been something changed in the source code of Gir for Windows which finally caused that.

dscho commented 3 weeks ago

Please test with the latest Git for Windows version. I suspect https://github.com/git-for-windows/git/pull/4968 to have fixed your problem already.

dscho commented 1 week ago

I'll go ahead and assume that my suspicion was correct.

GitPopcorn commented 1 week ago

I'll go ahead and assume that my suspicion was correct.

Sorry for taking so long to reply, I have... met a big trouble last weekends, then I forgot about this issue...

I have tested it with the latest version Git version 2.45.2.windows.1 just now. this issue has been exactly fixed. Thanks for you work.

dscho commented 1 week ago

Thank you for testing!