cmderdev / cmder

Lovely console emulator package for Windows
https://cmder.app
MIT License
25.74k stars 2.02k forks source link

[Bug] %20 special character issue while navigating #2859

Closed vivekdhiman closed 1 year ago

vivekdhiman commented 1 year ago

Version Information

Cmder version: Latest
Operating system:

Cmder Edition

Cmder Full (with Git)

Description of the issue

I just tried using thill tools. I have repository which contains a folder say My%20Home, when I navigate like : cd "My%20Home" Below error shows up.

prompt filter failed: ...\cmder\vendor\clink.lua:175: invalid capture index stack traceback: [C]: in function 'gsub' ...cmder\vendor\clink.lua:175: in function '?'

How to reproduce

No response

Additional context

No response

Checklist

chrisant996 commented 1 year ago

Confirmed. It's a bug in Cmder's custom prompt, and it exists in several places.

    prompt = string.gsub(prompt, "{cwd}", cwd)

The string.gsub() function says that the third parameter is the replacement string, and that % is an escape character. The second parameter is the search pattern, and % is an escape character there, as well.

A few of the calls to gsub() explicitly use escaped %% in the third parameter; those ones are ok.

Cmder has a verbatim() function that handles escaping the %, but only 3 places use it (and 2 of those places use it technically incorrectly since they assume ANSI escape codes will never contain % characters, which isn't technically guaranteed).

Places that use gsub() need to ensure the second and third parameters are either properly escaped by calling verbatim(), or that escaping is already present (e.g. inside verbatim() itself), or that escaping is guaranteed to never be needed (e.g. when passing a hard-coded string that doesn't contain any %).

This example was missing verbatim(),

-    prompt = string.gsub(prompt, "{cwd}", cwd)
+    prompt = string.gsub(prompt, "{cwd}", verbatim(cwd))

This example needs verbatim() around the whole third parameter,

-            clink.prompt.value = string.gsub(clink.prompt.value, "{git}", " "..color.."("..verbatim(branch)..")")
+            clink.prompt.value = string.gsub(clink.prompt.value, "{git}", verbatim(" "..color.."("..branch..")"))

This example needs verbatim() for both the second and third parameters,

-        cwd = string.gsub(cwd, clink.get_env("HOME"), prompt_homeSymbol)
+        cwd = string.gsub(cwd, verbatim(clink.get_env("HOME")), verbatim(prompt_homeSymbol))

And so on.

chrisant996 commented 1 year ago

Actually, the problem affects both the third and the second parameter. I've updated my earlier response.

DRSDavidSoft commented 1 year ago

@chrisant996 Thank you for the investigation! Can you please have a look at #2013 since it proposes to decode all URI-encoded characters in verbatim()? What is your opinion of that?

Also, since I'm not well versed in Lua and clink scripts, what is the purpose of verbatim() in your opinion, and what is it intended to do, if I may ask?

chrisant996 commented 1 year ago

Also, since I'm not well versed in Lua and clink scripts, what is the purpose of verbatim() in your opinion, and what is it intended to do, if I may ask?

The purpose is what the function comment and original commit description say, which handles a documented behavior of using the gsub() function. I'm uncomfortable with calling it "my opinion", since that would be like saying "in my opinion, the earth is round".

The original commit applied verbatim() to all the necessary places at the time it was first introduced. But since then, new usages of gsub() have been added that didn't use verbatim() where it was needed.

The gsub() function is capable of doing regex-like captures and replacements, which is why the % sign has special meaning. The find() function has an optional plain parameter to disable the regex-like syntax, but the gsub() function doesn't have any way to disable the regex-like syntax, so it's necessary to manually escape % into %% when the intent is to use gsub() without using its regex-like features.

So, maybe an even better approach would be to create a local function plain_gsub(str, find, replace) which internally calls return string.gsub(str, verbatim(find), verbatim(replace)), and then use plain_gsub() in appropriate places (which currently would be all of the places outside of verbatim() itself).

Function comment:

---
-- Makes a string safe to use as the replacement in string.gsub
---
local function verbatim(s)
    s = string.gsub(s, "%%", "%%%%")
    return s
end

Commit description:

Commit:     51e75d4    (51e75d4bb5a76b05e8b96674405749f00ec296a8)
Author:     Martin Böhm <martin.boehm@gmail.com>
Committer:  Benjamin Staneck <Stanzilla@users.noreply.github.com>
Date:       2018/12/19 12:17:45

    add percent escaping for string.gsub (#1991)

    In `string.gsub()`, the `%` character has special meaning and must be escaped to be treated verbatim, otherwise the
    "invalid use of '%' in replacement string" warning will show up.

    This adds a verbatim() function for that purpose. It fixes this warning for situations where `'%` characters are in
    the current path (cwd), version control branch names, or in the previous `PROMPT` set by the user.

@chrisant996 Thank you for the investigation! Can you please have a look at #2013 since it proposes to decode all URI-encoded characters in verbatim()? What is your opinion of that?

Wow, that's fascinating. I don't know enough about svn to say for sure whether that PR will work right for svn in all cases, but it's definitely very wrong for everything else. The issue we're discussing right now (2859) is case in point: the PR would convert My%20Home into My Home, which would be quite incorrect since the directory name is literally My%20Home in the file system.

It might be reasonable to instead add separate calls to decodeURI() only in the places that have to do with svn, but I don't know enough about svn to know if that's really correct to do. What I've read on the internet so far doesn't mention anything about svn encoding characters with %, and only mentions svn being configurable which charset (or in Windows, codepage) to use for file names -- it's hard to find the info because most of the search hits are actually about file content encoding.

EDIT: "The svn book", says that svn only uses % URI encoding for URLs. It sounds like they should not be showing up in non-URLs with svn.

chrisant996 commented 1 year ago

I wonder if @SCWR might be saying that the svn paths in their file system literally have %xx sequences as part of the actual file names? I.e. someone manually "encoded" file names as a workaround to get svn to be able to store them in ASCII instead of solving whatever locale/encoding configuration problem was occurring?

And SCWR wanted to decode them for display purposes since for example a path like "%48%45%4c%4c%4f" isn't human-readable (which decodes to "HELLO", by the way)?

If so, then the intent behind the PR would make sense, but then it would not be a correct approach for anyone else, really.