Excessive clink execution even if `cmd.exe` called without interactive input

chrisant996 / clink

Bash's powerful command line editing in cmd.exe

https://chrisant996.github.io/clink/

GNU General Public License v3.0

3.44k stars 135 forks source link

Excessive clink execution even if `cmd.exe` called without interactive input #649

Closed andry81 closed 1 month ago

andry81 commented 1 month ago

There is a set of cases where the cmd.exe command line interactivity is not the case, but the clink does still execute. This slowdowns a batch script execution.

Example:

1.bat:

@echo off

for /F "usebackq tokens=* delims=" %%i in (`@ver`) do set "RETURN_VALUE=%%i"

set RETURN_VALUE

Open the console.
Run the script.

The ProcessMonitor shows clink execution after the @ver command: C:\WINDOWS\system32\cmd.exe /c @ver

But the /c flag means that there won't be any interaction between user and the console.

This is not the only case where the clink does execute when it is not necessary to execute.

Another example:

If cmd.exe called with redirected stdin or stdout, then the user interactivity is disabled on the command line by the cmd.exe itself and, for example, TAB character on the stdin behaves like the tabulation character instead of command line completion key.

OS: Windows 8.1

clink info

version          : 1.6.18.80e2a3
injected         : clink_dll_x64.dll
system           : 6.3.09600.20778
codepage         : 1251
keyboard langid  : 1033
keyboard layout  : 00000409

chrisant996 commented 1 month ago

Issues #312 and #361 have some related info, but I can't find the issue(s) where I described how autorun and injecting work together, so I'll describe it here. I'll also add a section in the documentation.

Personally, I do not recommend turning on autorun for Clink:

I haven't removed the feature because (1) it existed as the default mode before I ever found Clink and a lot of people like the feature, and (2) it does the best it can do given how the CMD AutoRun regkey operates.

Personally, I do not use Clink with autorun for three reasons: (1) it slows down startup of cmd.exe processes that happen in the background or in automated situations, (2) it introduces the possibility of interfering with background or automated cmd usage, and (3) I don't even want Clink auto-injected into every cmd session, and I prefer to be explicit about when I'm going to use Clink. I use LNK files or Windows Terminal profiles to control how and when Clink is injected.

There is a difference between executing clink_x64.exe versus injecting Clink:

You're observing that clink_x64.exe gets launched. Yes, that's entirely accurate and expected!

Clink avoids injecting itself into the cmd.exe process if it detects that the cmd.exe instance will not be interactive. But "inject" doesn't mean "run clink_x64.exe", instead "inject" means "inject clink code into the cmd.exe process and hook system APIs so that Clink can intercept things".

There is no way for a batch script to figure out whether cmd.exe is going to be interactive. To detect that, some .exe program is needed. And the clink_x64.exe program is able to remotely inspect the cmd.exe process and its command line, and assess whether the cmd.exe process will be interactive. If it assesses that it will be non-interactive, then clink_x64.exe exits _without injecting clink_dllx64.dll into the cmd.exe process. The act of checking whether cmd.exe will be interactive is much faster than the act of injecting Clink into the cmd.exe process, and so the check minimizes the amount of slowdown that occurs.

But slowdown of background/automated cmd.exe invocations is unavoidable, if Clink is configured for autorun.

That's working as expected.

How to prevent clink_x64.exe from running:

There are several ways:

The cmd /d flag disables CMD's AutoRun regkey processing, which will prevent CMD from running the Clink autorun script.
If the %CLINK_NOAUTORUN% environment variable is set, then the Clink autorun script exits quickly, without even invoking clink_x64.exe.
Clink autorun can be uninstalled: clink autorun uninstall.

Why doesn't clink_x64.exe skip injecting when stdin or stdout are redirected?

The goal of trying to detect "interactive cmd.exe session" was to reduce the slowdown in the most common cases of background/automated usage.

Redirected stdin/stdout without /c isn't the most common case, and I didn't go to the trouble of trying to write code that reliably and accurately analyzes stdin/stdout handles to assess whether injecting Clink will end up being a no-op. But that's not even something that can be reliably and accurately assessed from outside the cmd.exe process itself. The only way to reliably and accurately check for redirected stdin/stdout is to do the check after having already injected Clink, which would be pointless since by that time Clink has already been injected, so the full slowdown cost has already been paid. It's true that clink_x64.exe could attempt to parse the cmd.exe command line for < and > characters to try to guess whether stdin or stdout are redirected, but trying to parse CMD language syntax is very complicated, and it's very easy to accidentally misinterpret a < or > and reach the wrong conclusion (for example, quotes and ^ can change how CMD interprets the symbols, and there are quirks about how CMD handles quotes which make it very complicated to accurately emulate how CMD is going to end up treating the symbols in different kinds of syntax).

Even if I added such a check, it would have to go inside clink_x64.exe itself. It wouldn't prevent clink_x64.exe from running. It would merely save spending a several milliseconds to do a superfluous injection of clink_dll_x64.dll into the cmd.exe process.

So, overall, trying to check for stdin/stdout isn't something that has much practical value.

andry81 commented 1 month ago

There is no way for a batch script to figure out whether cmd.exe is going to be interactive. To detect that, some .exe program is needed.

What about flag /c? Isn't it for that purpose?

And the clink_x64.exe program is able to remotely inspect the cmd.exe process and its command line, and assess whether the cmd.exe process will be interactive.

Why not inspect this from the CreateProcess from the cmd.exe itself? Why the clink_x64.exe keeps create processes even if the very first cmd.exe is executed? It would be much faster to not run redundant processes at all.

But slowdown of background/automated cmd.exe invocations is unavoidable, if Clink is configured for autorun.

It slow downs the existing scripts even if there is no any explicit/implicit cmd.exe calls.

I have a project with bunch of such scripts and tests shows slowdown: https://github.com/andry81/contools/tree/HEAD/Scripts/Tests/bench/batscripts

If run test_all.bat

CLINK_NOAUTORUN=1	CLINK_NOAUTORUN=0

All tests here except test_build__load_config_01.bat does not call to cmd.exe explicitly or implicitly. I suspect clink somehow interfere with the cmd.exe inner parser calls like an environment variable read or ReadFile API which is heavily used by the cmd.exe itself.

The cmd /d flag disables CMD's AutoRun regkey processing, which will prevent CMD from running the Clink autorun script.

I can not use that because scripts must run even without clink as is and an additional not expected flag could alter the existing behavior. On another hand there is cases where I can not even add the flag such like those from the 1.bat script in the first message.

If the %CLINK_NOAUTORUN% environment variable is set, then the Clink autorun script exits quickly, without even invoking clink_x64.exe.

Seems the only reasonable way to accomplish this.

The only way to reliably and accurately check for redirected stdin/stdout is to do the check after having already injected Clink, which would be pointless since by that time Clink has already been injected, so the full slowdown cost has already been paid.

That's why CreateProcess interception is better than the run another clink_x64.exe instance.

It's true that clink_x64.exe could attempt to parse the cmd.exe command line for < and > characters to try to guess whether stdin or stdout are redirected

You wouldn't need that if call GetFileType from intercepted CreateProcess call. Any type other than FILE_TYPE_CHAR is a redirected handle.

chrisant996 commented 1 month ago

There is no way for a batch script to figure out whether cmd.exe is going to be interactive. To detect that, some .exe program is needed.

What about flag /c? Isn't it for that purpose?

How is a batch script supposed to figure out what the full raw command line parameters were to the cmd.exe process that hosts the batch script? If you think you know some way to accomplish that, please share.

And the clink_x64.exe program is able to remotely inspect the cmd.exe process and its command line, and assess whether the cmd.exe process will be interactive.

Why not inspect this from the CreateProcess from the cmd.exe itself? Why the clink_x64.exe keeps create processes even if the very first cmd.exe is executed? It would be much faster to not run redundant processes at all.

How is a batch script supposed to intercept an OS API? It's impossible. If you think you know some way to accomplish that, please share.

"AutoRun" is accomplished by setting the CMD AutoRun regkey. Once set, then CMD.EXE itself runs the commands in the regkey, every time CMD.EXE runs. That's why clink.bat and clink_x64.exe try to analyze the cmd.exe process they've been told to inject into, before actually spending the extra time to inject into it.

But slowdown of background/automated cmd.exe invocations is unavoidable, if Clink is configured for autorun.

It slow downs the existing scripts even if there is no any explicit/implicit cmd.exe calls.

That's how the CMD AutoRun regkey works. Anything that gets set into that regkey runs every single time any cmd.exe process is started even if the cmd.exe process is not actually visible to the user.

But you're mistaken about one thing: "even if there is no any explicit/implicit cmd.exe calls". No. The only time that CMD runs the AutoRun regkey commands is when a new cmd.exe process starts. So if that's indeed the source of the slowdown, then there are definitely additional cmd.exe processes being created. Clink of course has no control over whether things start new cmd.exe processes.

I have a project with bunch of such scripts and tests shows slowdown: https://github.com/andry81/contools/tree/HEAD/Scripts/Tests/bench/batscripts

If run test_all.bat

CLINK_NOAUTORUN=1 CLINK_NOAUTORUN=0 All tests here except test_build__load_config_01.bat does not call to cmd.exe explicitly or implicitly. I suspect clink somehow interfere with the cmd.exe inner parser calls like an environment variable read or ReadFile API which is heavily used by the cmd.exe itself.

Tonight I'll look at the test scripts.

You stated that there are no additional cmd.exe processes getting started, neither explicitly nor implicitly. Have you tried using ProcMon to test whether any additional cmd.exe processes are getting created?

The only way to reliably and accurately check for redirected stdin/stdout is to do the check after having already injected Clink, which would be pointless since by that time Clink has already been injected, so the full slowdown cost has already been paid.

That's why CreateProcess interception is better than the run another clink_x64.exe instance.

How is Clink supposed to intercept CreateProcess calls inside cmd.exe before Clink has been injected into cmd.exe? It's impossible.

But more importantly, how is Clink supposed to intercept the OS CreateProcess API inside Explorer.exe and other processes that spawn cmd.exe? It's impossible.

Clink is not the one that processes the CMD AutoRun regkey. Clink cannot influence how CMD processes the CMD AutoRun regkey. The way AutoRun injects Clink is by always running clink.bat, which internally checks CLINK_NOAUTORUN, and then runs clink_x64.exe which does some more checks, and finally does the really expensive part of doing cross-process remote thread injection into the target cmd.exe process.

This is simply how CMD AutoRun works. I can't change CMD.

I agree the performance cost is not desirable. I do not have my Clink configured for autorun. I actually recommend that no one configure it that way. There is no way to avoid performance overhead when using AutoRun. And that goes for any program in the CMD AutoRun regkey, not just Clink.

It's true that clink_x64.exe could attempt to parse the cmd.exe command line for < and > characters to try to guess whether stdin or stdout are redirected

You wouldn't need that if call GetFileType from intercepted CreateProcess call. Any type other than FILE_TYPE_CHAR is a redirected handle.

No... Clink is not in control of how AutoRun works. Your understanding of the AutoRun mechanism isn't accurate.

andry81 commented 1 month ago

How is a batch script supposed to figure out what the full raw command line parameters were to the cmd.exe process that hosts the batch script? If you think you know some way to accomplish that, please share.

If you call cmd.exe from this batch file :) But I don't think a batch script needs to know this anyway.

How is a batch script supposed to intercept an OS API? It's impossible. If you think you know some way to accomplish that, please share.

You didn't get this. The clink only call once, the rest of interception is made from the cmd.exe itself. The method how this can be integrated into the registry is another question.

You stated that there are no additional cmd.exe processes getting started, neither explicitly nor implicitly. Have you tried using ProcMon to test whether any additional cmd.exe processes are getting created?

This is how I have found that. :)

But more importantly, how is Clink supposed to intercept the OS CreateProcess API inside Explorer.exe and other processes that spawn cmd.exe? It's impossible.

By changing the loading method. For example through the "Always Loaded DLLs" feature: https://learn.microsoft.com/en-us/windows/win32/win7appqual/appinit-dlls-in-windows-7-and-windows-server-2008-r2

chrisant996 commented 1 month ago

How is a batch script supposed to figure out what the full raw command line parameters were to the cmd.exe process that hosts the batch script? If you think you know some way to accomplish that, please share.

If you call cmd.exe from this batch file :) But I don't think a batch script needs to know this anyway.

You want AutoRun to not run clink_x64.exe. I get why. I'd like that, too.

CMD runs the commands in its AutoRun regkey. When configured for autorun, Clink adds a call to clink.bat into the AutoRun regkey. The clink.bat script is the only opportunity Clink has to avoid running an EXE or loading a DLL into cmd.exe.

So, the only way to avoid running clink_x64.exe when configured for autorun, would be for the clink.bat script to be able to inspect the raw command line from its host cmd.exe process. And that's impossible.

So, the clink.bat script has to launch clink_x64.exe, which is able to inspect the raw command line of the host cmd.exe process. When autorun is configured, then the best that Clink can do is respond to CLINK_NOAUTORUN, or to launch clink_x64.exe and at least avoid the DLL injection and hooking system APIs inside CMD.

Which is part of why I do not use autorun -- I'm not willing to pay the performance penalty.

How is a batch script supposed to intercept an OS API? It's impossible. If you think you know some way to accomplish that, please share.

You didn't get this. The clink only call once, the rest of interception is made from the cmd.exe itself. The method how this can be integrated into the registry is another question.

I understand what you're suggesting. But Clink cannot run any of its code until after Clink is invoked! And the whole point is to try to avoid invoking the EXE or DLL, but your suggestions are about how to change code that's in the EXE or DLL -- but that code isn't involved until after the performance cost already been paid.

You stated that there are no additional cmd.exe processes getting started, neither explicitly nor implicitly. Have you tried using ProcMon to test whether any additional cmd.exe processes are getting created?

This is how I have found that. :)

Can you share the procmon trace that shows exactly only one cmd.exe instance started, but lots of clink_x64.exe instances getting started? That doesn't make any sense, and I can't even begin to imagine how that could happen. The only thing that automatically runs Clink is the AutoRun regkey, the only thing that runs the AutoRun regkey is CMD itself, and it only does that once when the CMD process starts.

But more importantly, how is Clink supposed to intercept the OS CreateProcess API inside Explorer.exe and other processes that spawn cmd.exe? It's impossible.

By changing the loading method. For example through the "Always Loaded DLLs" feature: https://learn.microsoft.com/en-us/windows/win32/win7appqual/appinit-dlls-in-windows-7-and-windows-server-2008-r2

No, that affects all processes in the system. That would be an even bigger performance degradation, affecting all processes.

chrisant996 commented 1 month ago

Also, AppInit_Dlls is incompatible with secure boot, and Windows 11 requires secure boot. AppInit_Dlls is also strongly discouraged because it creates performance problems and can cause system deadlocks. It isn't something Clink could use.

andry81 commented 1 month ago

CMD runs the commands in its AutoRun regkey. When configured for autorun, Clink adds a call to clink.bat into the AutoRun regkey. The clink.bat script is the only opportunity Clink has to avoid running an EXE or loading a DLL into cmd.exe.

So, the only way to avoid running clink_x64.exe when configured for autorun, would be for the clink.bat script to be able to inspect the raw command line from its host cmd.exe process. And that's impossible.

So, the clink.bat script has to launch clink_x64.exe, which is able to inspect the raw command line of the host cmd.exe process. When autorun is configured, then the best that Clink can do is respond to CLINK_NOAUTORUN, or to launch clink_x64.exe and at least avoid the DLL injection and hooking system APIs inside CMD.

Another way is to ignore child calls to clink.exe and keep run from intercepted CreateProcess from inside the cmd.exe.

I understand what you're suggesting. But Clink cannot run any of its code until after Clink is invoked! And the whole point is to try to avoid invoking the EXE or DLL, but your suggestions are about how to change code that's in the EXE or DLL -- but that code isn't involved until after the performance cost already been paid.

Create a shared section in the memory before call to CreateProcess from cmd.exe. When clink.exe runs, then it reads the shared section and does nothing. The cmd.exe now injects into child cmd.exe instead. No costs.

Can you share the procmon trace that shows exactly only one cmd.exe instance started, but lots of clink_x64.exe instances getting started? That doesn't make any sense, and I can't even begin to imagine how that could happen. The only thing that automatically runs Clink is the AutoRun regkey, the only thing that runs the AutoRun regkey is CMD itself, and it only does that once when the CMD process starts.

It does NOT run. Just put the pause between the calls:

pause
call "%%TESTS_PROJECT_ROOT%%/test_std.bat"
pause
call "%%TESTS_PROJECT_ROOT%%/test_std_encode.bat"
pause
rem NOTE: clink.exe calls at the end of this script
call "%%TESTS_PROJECT_ROOT%%/test_build.bat"
pause

chrisant996 commented 1 month ago

CMD runs the commands in its AutoRun regkey. When configured for autorun, Clink adds a call to clink.bat into the AutoRun regkey. The clink.bat script is the only opportunity Clink has to avoid running an EXE or loading a DLL into cmd.exe. So, the only way to avoid running clink_x64.exe when configured for autorun, would be for the clink.bat script to be able to inspect the raw command line from its host cmd.exe process. And that's impossible. So, the clink.bat script has to launch clink_x64.exe, which is able to inspect the raw command line of the host cmd.exe process. When autorun is configured, then the best that Clink can do is respond to CLINK_NOAUTORUN, or to launch clink_x64.exe and at least avoid the DLL injection and hooking system APIs inside CMD.

Another way is to ignore child calls to clink.exe and keep run from intercepted CreateProcess from inside the cmd.exe.

Clink can only intercept an API in the same process where Clink has already been injected.

The CreateProcess during AutoRun is happening in a completely new process where Clink has not yet been injected.

It's impossible for Clink to intercept the CreateProcess call that's launching clink_x64.exe, because the CreateProcess is happening in a process that doesn't have any Clink code yet.

I understand what you're suggesting. But Clink cannot run any of its code until after Clink is invoked! And the whole point is to try to avoid invoking the EXE or DLL, but your suggestions are about how to change code that's in the EXE or DLL -- but that code isn't involved until after the performance cost already been paid.

Create a shared section in the memory before call to CreateProcess from cmd.exe. When clink.exe runs, then it reads the shared section and does nothing. The cmd.exe now injects into child cmd.exe instead. No costs.

You're missing that the CreateProcess is happening inside a cmd.exe where Clink has not yet been injected (you can observe this in a ProcMon trace by comparing process IDs and by looking at when clink_dll_x64.dll gets loaded into a process).

It's impossible to do that suggestion.

chrisant996 commented 1 month ago

Can you share the procmon trace that shows exactly only one cmd.exe instance started, but lots of clink_x64.exe instances getting started? That doesn't make any sense, and I can't even begin to imagine how that could happen. The only thing that automatically runs Clink is the AutoRun regkey, the only thing that runs the AutoRun regkey is CMD itself, and it only does that once when the CMD process starts.

It does NOT run. Just put the pause between the calls:
pause
call "%%TESTS_PROJECT_ROOT%%/test_std.bat"
pause
call "%%TESTS_PROJECT_ROOT%%/test_std_encode.bat"
pause
rem NOTE: clink.exe calls at the end of this script
call "%%TESTS_PROJECT_ROOT%%/test_build.bat"
pause

Oh, I think I understand now:

This one issue (649) is reporting two completely separate issues:

Starting a cmd.exe process launches clink_x64.exe when Clink is configured for AutoRun. Most of our conversation has been about that.
Having Clink injected causes a small but measurable performance degradation in batch scripts. This is not related at all to the first issue (I thought you were saying the first issue was causing the second, but you've confirmed that they're unrelated).

Regarding scripts taking longer to run

Yes. Clink has to hook certain system APIs in cmd.exe, which adds code that has to run in those APIs. More code takes more time to run.

The only way to avoid that is to not inject Clink.

The implementations of the API hooks are here and here.

andry81 commented 1 month ago

2. Having Clink injected causes a small but measurable performance degradation in batch scripts.

I won't call 30% a small degradation. It's quite a notable slowdown.

andry81 commented 1 month ago

The implementations of the API hooks are here and here.

Thx for the info. As I suggested this is interference with the environment variable API.

chrisant996 commented 1 month ago

Having Clink injected causes a small but measurable performance degradation in batch scripts.

I won't call 30% a small degradation. It's quite a notable slowdown.

The duration of overhead is measured in hundredths of seconds.

The amount of overhead is proportional to the amount of environment variable setting that occurs and the amount of times that output is printed.

If you want less overhead, there are two possibilities:

Use a faster CPU.
Don't use Clink.

The source code links I shared show the additional code that's being executed. I don't know how to have less overhead.

Would you like to try to find a way and make a PR?

chrisant996 commented 1 month ago

I made a batch script that runs 5000 iterations of setting an environment variable. Without Clink, it takes 0.16 seconds on my computer. With Clink, it takes 0.25 seconds on my computer.

That's an average of an additional 20 microseconds per environment variable set. That's very small, but it's much longer than I'd expect.

I did some profiling, and found that almost the entire overhead is coming from the seh_scope seh; line. That line swaps a different Unhandled Exception Filter so that Clink can intercept crashes caused by Clink, and give a meaningful report.

It would not be a good idea to remove the crash attribution by default.

However, I would be willing to add a setting that allows disabling the crash attribution, to reduce the overhead cost. Are the milliseconds of overhead important enough for your needs, to justify adding a way to disable crash attribution?

chrisant996 commented 1 month ago

Actually, I can do better than a setting:

Clink already installs an unhandled exception filter when it gets loaded. So, it doesn't need to set the unhandled exception filter inside each API call anymore.

I can change it to merely periodically detect if something else has overwritten Clink's unhandled exception filter, and print to the log and mention it in output from the clink-diagnostics key binding.

So, I was wrong, there is a way to remove most of the overhead (but not all of it). Because the overhead is coming from work happening inside an OS API, not from work done directly by Clink code. Yes, Clink calls the OS API, but I mean that the OS API looked like it would be trivial and not multiply the overhead cost by 10.

To get a meaningful measurement, I had to change the test script to run 50,000 iterations of setting an environment variable. Without Clink, it takes 1.16 seconds on my computer. With Clink, it takes 1.29 seconds on my computer.

That's an average of 2 microseconds overhead per env var set, instead of 20 microseconds. That's the overhead I was originally expecting per call.

There was an important piece of information that wasn't in the original report: an estimate of the total number of environment variable set operations performed by the scripts. I had to analyze the contools repo and learn what all the scripts are doing, to realize that they're actually doing very little. Which means the numbers are high because the scripts are running on a machine with a CPU that is much slower than the one in my gaming laptop.

Once I realized that, then it was clear that the overhead was roughly 10x what I expected.

andry81 commented 1 month ago

Which means the numbers are high because the scripts are running on a machine with a CPU that is much slower than the one in my gaming laptop.

@chrisant996 You can run test bench scripts on your laptop to see the difference.

chrisant996 commented 1 month ago

@andry81 I'm curious to hear the script timings when you use Clink v1.6.21.

andry81 commented 4 weeks ago

@andry81 I'm curious to hear the script timings when you use Clink v1.6.21.

CLINK_NOAUTORUN=1	CLINK_NOAUTORUN=0

chrisant996 commented 4 weeks ago

Interesting! Thanks for sharing. I expected more difference to exist than that.

On my hardware, the change only cuts 90% of the overhead, but on your hardware it cuts almost the entire overhead.

Maybe the fact that I'm using a VM amplifies the overhead.

I'm glad the change ended up making an even bigger difference than the testing suggested it would.