evilmartians / lefthook

Fast and powerful Git hooks manager for any type of projects.
MIT License
4.75k stars 211 forks source link

Invalid escaping of {staged_files} #786

Open stevebeauge opened 1 month ago

stevebeauge commented 1 month ago

:wrench: Summary

If file names contains some utf8 (I guess) characters, lefthook fails because of invalid escaping.

error replacing {staged_files}: CreateFile C:\code\repro\lefthook-wrong-escape\"files\KO \303\251.json": La syntaxe du nom de fichier, de répertoire ou de volume est incorrecte.

(part of the message is in French, probably the system error lefthook trapped).

Lefthook version

1.7.7

Running on Windows 10. Both reproed using PowerShell 7 and legacy cmd console.

Steps to reproduce

Clone this repo: https://github.com/stevebeauge/repro-lefthook-wrong-escape

git clone https://github.com/stevebeauge/repro-lefthook-wrong-escape
cd repro-lefthook-wrong-escape
pnpm i

Edit the file files/KO é.json. Set whatever you want in the file. If you are using a powershell console you can run

Get-Date | ConvertTo-Json | Set-Content -LiteralPath '.\files\KO é.json' -Encoding utf8 

Then try to commit the changes:

git add .
git commit -m "Something"

Expected results

It should work whatever the file name is (assuming it's a valid filename for the system).

Actual results

It fails because utf8 character is escaped, but the escape character is itself not properly escaped:

error replacing {staged_files}: CreateFile C:\code\repro\lefthook-wrong-escape\"files\KO \303\251.json": La syntaxe du nom de fichier, de répertoire ou de volume est incorrecte.

Possible Solution

Logs / Screenshots

Running pnpm lefthook run pre-commit -v in a powershell console generates this logs:

│ [lefthook] cmd: [git rev-parse --show-toplevel]
│ [lefthook] err: <nil>
│ [lefthook] out: C:/code/repro/lefthook-wrong-escape

│ [lefthook] cmd: [git rev-parse --git-path hooks]
│ [lefthook] err: <nil>
│ [lefthook] out: .git/hooks

│ [lefthook] cmd: [git rev-parse --git-path info]
│ [lefthook] err: <nil>
│ [lefthook] out: .git/info

│ [lefthook] cmd: [git rev-parse --git-dir]
│ [lefthook] err: <nil>
│ [lefthook] out: .git

│ [lefthook] cmd: [git hash-object -t tree /dev/null]
│ [lefthook] err: <nil>
│ [lefthook] out: 4b825dc642cb6eb9a060e54bf8d69288fbee4904

╭──────────────────────────────────────╮
│ 🥊 lefthook v1.7.7  hook: pre-commit │
╰──────────────────────────────────────╯
│ [lefthook] cmd: [git status --short --porcelain]
│ [lefthook] dir: C:/code/repro/lefthook-wrong-escape
│ [lefthook] err: <nil>
│ [lefthook] out: M  "files/KO \303\251.json"

│ [lefthook] cmd: [git diff --name-only --cached --diff-filter=ACMR]
│ [lefthook] dir: C:/code/repro/lefthook-wrong-escape
│ [lefthook] err: <nil>
│ [lefthook] out: "files/KO \303\251.json"

│  prettier (skip) error replacing {staged_files}: CreateFile C:\code\repro\lefthook-wrong-escape\"files\KO \303\251.json": La syntaxe du nom de fichier, de répertoire ou de volume est incorrecte.
│ [lefthook] cmd: [git stash list]
│ [lefthook] dir: C:/code/repro/lefthook-wrong-escape
│ [lefthook] err: <nil>
│ [lefthook] out:

  ────────────────────────────────────
summary: (done in 0.28 seconds)
🥊  prettier: error replacing {staged_files}: CreateFile C:\code\repro\lefthook-wrong-escape\"files\KO \303\251.json": La syntaxe du nom de fichier, de répertoire ou de volume est incorrecte.
╭

As a comparison, modifying files/ok.json is working. It's verbose output is:

│ [lefthook] cmd: [git rev-parse --show-toplevel]
│ [lefthook] err: <nil>
│ [lefthook] out: C:/code/repro/lefthook-wrong-escape

│ [lefthook] cmd: [git rev-parse --git-path hooks]
│ [lefthook] err: <nil>
│ [lefthook] out: .git/hooks

│ [lefthook] cmd: [git rev-parse --git-path info]
│ [lefthook] err: <nil>
│ [lefthook] out: .git/info

│ [lefthook] cmd: [git rev-parse --git-dir]
│ [lefthook] err: <nil>
│ [lefthook] out: .git

│ [lefthook] cmd: [git hash-object -t tree /dev/null]
│ [lefthook] err: <nil>
│ [lefthook] out: 4b825dc642cb6eb9a060e54bf8d69288fbee4904

╭──────────────────────────────────────╮
│ 🥊 lefthook v1.7.7  hook: pre-commit │
╰──────────────────────────────────────╯
│ [lefthook] cmd: [git status --short --porcelain]
│ [lefthook] dir: C:/code/repro/lefthook-wrong-escape
│ [lefthook] err: <nil>
│ [lefthook] out: M  files/ok.json

│ [lefthook] cmd: [git diff --name-only --cached --diff-filter=ACMR]
│ [lefthook] dir: C:/code/repro/lefthook-wrong-escape
│ [lefthook] err: <nil>
│ [lefthook] out: files/ok.json

│ [lefthook] files before filters:
[files/ok.json]
│ [lefthook] files after filters:
[files/ok.json]
│ [lefthook] files after escaping:
[files/ok.json]
│ [lefthook] executing: pnpm prettier files/ok.json
┃  prettier ❯

"2024-07-25T18:29:38.9212089+02:00"

│ [lefthook] cmd: [git stash list]
│ [lefthook] dir: C:/code/repro/lefthook-wrong-escape
│ [lefthook] err: <nil>
│ [lefthook] out:

  ────────────────────────────────────
summary: (done in 1.38 seconds)
✔️  prettier
mrexox commented 1 month ago

Hey! Thank you for submitting this issue. I couldn't reproduce the issue on my macOS, so I assume this is something OS-related. I will try to figure it out.

mrexox commented 1 month ago

From my perspective it looks like git command returns the file list in different encoding.

Could you try setting the default encoding for Powershell to UTF-8 (if not set yet)?

I found some useful options here: https://stackoverflow.com/questions/57131654/using-utf-8-encoding-chcp-65001-in-command-prompt-windows-powershell-window/57134096#57134096

stevebeauge commented 1 month ago

It is already on utf8 code page (65001).

Which git command lefthook runs? Can I "test" the single comment to see how it behaves?

stevebeauge commented 1 month ago

image

mrexox commented 1 month ago

Yes, it's git diff --name-only --cached --diff-filter=ACMR. So, it returns the same corrupted result when executed explicitly, right?

mrexox commented 1 month ago

Maybe this can help: https://stackoverflow.com/questions/10651975/unicode-utf-8-with-git-bash

stevebeauge commented 1 month ago

Ok, found a workaround.

git config core.quotepath false
git status

Returns:

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        modified:   files/KO é.json

Restoring the config

git config core.quotepath true
git status

Returns:

Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
        modified:   "files/KO \303\251.json"

I close the issue. It's not related to lefthook, but to git actually. Thanks for the help

stevebeauge commented 1 month ago

Sorry, I reopen the issue.

The git config I set remove the escaping, but also the surrounding quote, which breaks lefthook

Without quotepath set to false: modified: "files/KO \303\251.json"

With quotepath set to true (which is the git default): modified: files/KO é.json. No quote around the file name.

This leads to lefthook error:

│ [lefthook] cmd: [git rev-parse --show-toplevel]
│ [lefthook] err: <nil>
│ [lefthook] out: C:/code/repro/lefthook-wrong-escape

│ [lefthook] cmd: [git rev-parse --git-path hooks]
│ [lefthook] err: <nil>
│ [lefthook] out: .git/hooks

│ [lefthook] cmd: [git rev-parse --git-path info]
│ [lefthook] err: <nil>
│ [lefthook] out: .git/info

│ [lefthook] cmd: [git rev-parse --git-dir]
│ [lefthook] err: <nil>
│ [lefthook] out: .git

│ [lefthook] cmd: [git hash-object -t tree /dev/null]
│ [lefthook] err: <nil>
│ [lefthook] out: 4b825dc642cb6eb9a060e54bf8d69288fbee4904

╭──────────────────────────────────────╮
│ 🥊 lefthook v1.7.7  hook: pre-commit │
╰──────────────────────────────────────╯
│ [lefthook] cmd: [git status --short --porcelain]
│ [lefthook] dir: C:/code/repro/lefthook-wrong-escape
│ [lefthook] err: <nil>
│ [lefthook] out: M  "files/KO é.json"

│ [lefthook] cmd: [git diff --name-only --cached --diff-filter=ACMR]
│ [lefthook] dir: C:/code/repro/lefthook-wrong-escape
│ [lefthook] err: <nil>
│ [lefthook] out: files/KO é.json

│ [lefthook] files before filters:
[files/KO é.json]
│ [lefthook] files after filters:
[files/KO é.json]
│ [lefthook] files after escaping:
['files/KO é.json']
│ [lefthook] executing: pnpm prettier 'files/KO é.json'
⠋ waiting: prettier[error] No files matching the pattern were found: "'files/KO".
[error] No files matching the pattern were found: "é.json'".
┃  prettier ❯

│ [lefthook] cmd: [git stash list]
│ [lefthook] dir: C:/code/repro/lefthook-wrong-escape
│ [lefthook] err: <nil>
│ [lefthook] out:

  ────────────────────────────────────
summary: (done in 1.33 seconds)
🥊  prettier

I'm not sure how lefthook works internally, but I wonder if it run a git command and analyses the results, or if it uses any package. If it works using the former, maybe there's something to do to be more reliable ?

stevebeauge commented 1 month ago

I don't know if it can solve the issue, but git diff has a -z flag that does not encode the file name:

git diff --name-only --cached --diff-filter=ACMR                                                                                                                                                            
"files/KO \303\251.json"

git diff --name-only --cached --diff-filter=ACMR -z                                                                                                                                                         
files/KO é.json^@

The final char is a NUL char. So what about adding the flag to lefthook, take the output minus the last NUL char and adding surrounding quotes?

mrexox commented 1 month ago

Could you try wrapping the {staged_files} in double quotes? Like this:

pre-commit:
  commands:
    lint:
      run: yarn lint "{staged_files}"

Of single quotes if double quotes don't work. This quoting should wrap each filename with the quotes.

stevebeauge commented 1 month ago

If I both wrap the {staged_files} with quotes and set git config core.quotepath false, it works as expected.

However, I don't feel comfortable with this solution. Git options are not committed as part of the source code. It requires every developer to set the option, and will eventually lead to issues or conflict with other tools that may rely on git commands (I'd bet turborepo or NX are using same technics to detect affected projects).

I never works with go and I have no idea of the impact/complexity, but I'd wish lefthook take the normalization into account.

From the git doc

Path names are encoded in UTF-8 normalization form C.

Maybe there's some lib in go that can revert the normalization back to plain utf8 encoding ?

(🤫 https://chatgpt.com/share/448c4b2f-c19d-4532-a57c-c7c0d9573a44)

This would create a bullet proof and cross platform way to read the file names.

mrexox commented 1 month ago

For the last issue the problem is that Windows doesn't work well with single quotes (as far as I know), and considers 'files/KO é.json' as two files: 'files/KO and é.json'. So, this is not the encoding issue here.

But I like your idea for normalization of the output. It makes sense. I'll put it on my radar :+1:

nicky1038 commented 1 month ago

I've also bumped into an issue with wrong escaping of {staged_files}, but it was of another kind.

If file name contains square brackets ([ or ]), it will be additionally wrapped with single quotes, making file name invalid for eslint, for example.

mrexox commented 1 month ago

@nicky1038 , in this case use "{staged_files}" in your configuration, so every file is wrapped in double quotes when calling with eslint

nicky1038 commented 1 month ago

@nicky1038 , in this case use "{staged_files}" in your configuration, so every file is wrapped in double quotes when calling with eslint

@mrexox Thank you for the quick response, the suggested solution works like a charm! And sorry for not reading docs before asking questions 😶