Open chopinrlz opened 1 month ago
Hi,
Given all of your file and data operations are performed directly with the .NET API I suggest this is not a PowerShell problem, it looks like it is a .NET issue.
Is your issue that PowerShell is not running the garbage collector? Does running
[System.GC]::Collect()
make any difference?
Also be aware that managed memory runtimes often take memory from the OS in order to allocate objects but never return it. So the objects may have been released/disposed/freed so the actual CLR heap is free but the memory has not been returned to the OS. This is completely normal as managed runtimes (CLR, JVM etc) assume that if they needed it once they are likely to need it again so no point giving it back to the OS.
You would need to look for tools to examine the state of the CLR heap within the process rather than external process monitoring tools.
When you have an operation that you know is memory intensive then two other options are available,
I hope that helps.
I cannot replicate the slowness you see as 7.4.2 takes less than 5 seconds for me and is in fact faster than WinPS but I do notice the large memory usage. My guess is that it is not only are now storing the 2 byte arrays (raw file data and the decoded base64 string) but also the base64 string is all allocated on the heap as part of the operation. Potentially WinPS/.NET Framework is more aggressive in reusing the array values but as per the above the CLR could be allocating the memory and just never freeing it so it can more efficient reuse the memory in the future.
Putting aside the above comment you can more efficiently base64 encode bytes by streaming it rather than reading all the input bytes into memory.
Function ConvertTo-Base64String {
[OutputType([string])]
[CmdletBinding()]
param (
[Parameter(Mandatory)]
[string]$Path
)
$fs = $cryptoStream = $sr = $null
try {
$fs = [System.IO.File]::OpenRead($Path)
$cryptoStream = [System.Security.CryptoGraphy.CryptoStream]::new(
$fs,
[System.Security.Cryptography.ToBase64Transform]::new(),
[System.Security.Cryptography.CryptoStreamMode]::Read)
$sr = [System.IO.StreamReader]::new($cryptoStream, [System.Text.Encoding]::ASCII)
$sr.ReadToEnd()
}
finally {
${sr}?.Dispose()
${cryptoStream}?.Dispose()
${fs}?.Dispose()
}
}
This will stream the raw bytes from the source file stream and produce the final output string. If you are storing this string into a file then you could optimize it further by streaming the output base64 CryptoStream to a file avoiding having to store all the data in PowerShell.
If you do need to store the base64 string as an object in PowerShell keep in mind this means you not only have to store the inflated size that base64 uses (($length / 3) * 4
) but each char of the string takes two bytes so at a minimum you are looking at around 600MB for the dotnet installer. Using the above function I see the memory usage sits around 1.2GB which is less than WinPS (about 1.4GB). While this is still more than the ~600MB there could be other factors in place here like the CLR allocating more memory than strictly needs in anticipation of future needs or some other reason.
Thank you both for your responses and suggestions for optimization. I did reply to the Issue cross-posted in the dotnet/runtime project which you can see here https://github.com/dotnet/runtime/issues/101061#issuecomment-2058103434
I ran [GC]::Collect()
after running the test script in both WinPS and PS7 and noticed that in PS7 almost none of the used memory was released back to the operating system, which seemed suspicious as in WinPS almost all memory is released after the test.
UPDATE: Confirmed below that the FromBase64String delay and excessive memory usage only occurs through PowerShell 7, but not through a .NET console app. ~This definitely appears to be a .NET run-time issue and not a PowerShell issue, however I wanted to post it here as well as over there for visibility in case others encounter this oddity while building PowerShell modules (which is how I stumbled upon it).~
I am aware that storing a 200 MB file in memory as base64 text is wildly inefficient, and is not the intention of how my PowerPass module should be used (which is how I discovered this in the first place), but since I stumbled upon this unexpected behavior I thought it prudent to at least report it.
But again, I appreciate all of the comments and feedback here, especially the suggestions for optimization techniques.
In PowerShell 7.4.2, the time to complete is 82 seconds and memory used is 3.4 GB.
Yes, the same memory usage, but time to complete is about 1.2 seconds
I retested the following updated script on my desktop PC. A Ryzen 5800X with 128 GB of RAM and PCIe Gen4 NVMe storage. The test ran much faster as expected, but the memory usage still remains high.
Even after invoking [GC]::Collect()
around 3.2 GB of RAM still remains utilized by the pwsh
process.
Reading the 222 MB file into memory takes 0.06 seconds and converting it to base64 takes 0.22 seconds and uses 845 MB of RAM across both operations as expected. The last operation [System.Convert]::FromBase64String
uses 2.6 GB of RAM alone and takes 17 seconds.
I'll cross-post this in the dotnet/runtime issue. Thank you all for the feedback.
Updated test script:
$name = "random.bin"
$start = Get-Date
Write-Host "Creating Path to $name test file: " -NoNewline
$now = Get-Date
$file = Join-Path -Path $PSScriptRoot -ChildPath $name
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
Write-Host "Reading all file bytes into memory: " -NoNewline
$now = Get-Date
$bytes = [System.IO.File]::ReadAllBytes( $file )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
Write-Host "Converting file bytes to base64 string: " -NoNewline
$now = Get-Date
$base64 = [System.Convert]::ToBase64String( $bytes )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
Write-Host "Converting base64 string back to file bytes: " -NoNewline
$now = Get-Date
$bytes = [System.Convert]::FromBase64String( $base64 )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
Write-Host "Test complete"
Write-Host "Total duration: $(((Get-Date) - $start).TotalMilliseconds) ms"
I just tested this same implementation using a C# console application for the dotnet/runtime team and the issues does NOT occur when running in a console application against .NET 8 on the latest SDK on Windows 11 Professional. My test results and C# code are here: https://github.com/dotnet/runtime/issues/101061#issuecomment-2059277320
It seems that this is actually a memory leak in the PowerShell runtime for some reason. The dotnet/runtime crew was asking if the [System.Convert]::FromBase64String
function was being replaced with something else by PowerShell, but since I haven't tweaked my PowerShell installation, I can't imagine what could be doing that. Plus, this issue happens on multiple PCs, my desktop and my laptop, and only within PowerShell 7.4.2.
I'm assuming PowerShell 7.4.2 is using .NET 8.0 under the hood. Does it ship with its own .NET run-time or does it rely on the run-time installed on the system?
For reference, the random.bin
test file I am using is exactly 233,420,544 bytes in length. I generated it with this script:
param(
[int]
$Size
)
$blockSize = 256
$rand = [System.Random]::new()
$total = 0
[byte[]]$data = [System.Array]::CreateInstance( [byte[]], $blockSize )
$path = Join-Path -Path $PSScriptRoot -ChildPath "random.bin"
if( Test-Path $path ) {
Remove-Item -Path $path -Force
}
$file = [System.IO.File]::OpenWrite( $path )
while( $total -lt $Size ) {
$rand.NextBytes( $data )
$file.Write( $data, 0, $data.Length )
$total += $blockSize
}
$file.Flush()
$file.Close()
I'm assuming PowerShell 7.4.2 is using .NET 8.0 under the hood. Does it ship with its own .NET run-time or does it rely on the run-time installed on the system?
With PowerShell 7.4.2 installed from Microsoft Store, Get-ChildItem $PSHOME
shows files like coreclr.dll
; I think that means PowerShell has its own copy of the .NET Runtime.
[System.Runtime.InteropServices.RuntimeInformation]::FrameworkDescription
is ".NET 8.0.4".
A couple more data points regarding the unexpected slowdown:
I see the slowdown only on Windows (neither on macOS nor on Linux), both in in 7.4.2 (.NET 8.0.4) and v7.5.0-preview.2 (.NET 9.0.0-preview.1.24080.9).
On Windows, I see the slowdown in 7.3.10 (.NET 7.0.14) too, albeit in less severe form (about twice as fast as 7.4.2 / v7.5.0-preview.2, but still way too slow); WinPS is fine.
In src/System.Management.Automation/engine/runtime/CompiledScriptBlock.cs
, there is class SuspiciousContentChecker
, which attempts to detect "suspicious strings" such as FromBase64String
. That then apparently causes PowerShell to log some ETW event. I wonder if Windows Defender monitors those events and then spends time investigating the process.
I would like to remind that earlier we observed slow pwsh operations with files due to antivirus.
I'm assuming PowerShell 7.4.2 is using .NET 8.0 under the hood. Does it ship with its own .NET run-time or does it rely on the run-time installed on the system?
With PowerShell 7.4.2 installed from Microsoft Store,
Get-ChildItem $PSHOME
shows files likecoreclr.dll
; I think that means PowerShell has its own copy of the .NET Runtime.
[System.Runtime.InteropServices.RuntimeInformation]::FrameworkDescription
is ".NET 8.0.4".
I am also seeing .NET 8.0.4 for the FrameworkDescription
on my PowerShell 7 install which I setup from the MSI downloaded from Github. Assembly file versions are 8.0.424.16909.
I would like to remind that earlier we observed slow pwsh operations with files due to antivirus.
The slow operation is the call to [System.Convert]::FromBase64String
which occurs after the file is loaded from disk into memory. On my desktop, this operation takes 17 seconds. The file load operation from disk takes 0.06 seconds.
Also, this only happens when doing this in PowerShell 7.4.2. Running this same test in a C# console application takes under 1 second.
In
src/System.Management.Automation/engine/runtime/CompiledScriptBlock.cs
, there isclass SuspiciousContentChecker
, which attempts to detect "suspicious strings" such asFromBase64String
. That then apparently causes PowerShell to log some ETW event. I wonder if Windows Defender monitors those events and then spends time investigating the process.
I added a Program Setting for pwsh.exe
into Exploit Protection under App & Browser Control and disabled all protections. Rerunning the test resulted in the same duration. 17 seconds and 3.4 GB of RAM usage. Is there another place I can check in Windows Security to prevent Defender from watching pwsh.exe
?
The excess memory consumption of the pwsh
process suggests that there is extra code running within the process; rather than antivirus software examining it from the outside (from another process or from a kernel-mode driver).
During the slow FromBase64String operation, is the pwsh
process consuming a lot of processor time (one thread's worth)?
Can you attach to an unmanaged-code debugger to the process during FromBase64String and get a stack trace of the thread with the most CPU time? (!runaway
, k
, Thread Syntax)
Here is a stack trace of the thread with the most CPU time. I used WinDbg and broke process execution in the middle of the FromBase64String operation. You can see some fun stuff at the top.
0:009> ~21 k
# Child-SP RetAddr Call Site
00 0000007b`a638e6d8 00007ffc`5b790a10 MPCLIENT!MpUpdateServicePingRpc+0x7b676
01 0000007b`a638e6e0 00007ffc`5b7cfde4 MPCLIENT!MpTelemetryUpdateUserConsent+0x190
02 0000007b`a638e720 00007ffc`5b7c5db0 MPCLIENT!MpConveyUserChoiceForSampleList+0x704
03 0000007b`a638e800 00007ffc`68253f92 MPCLIENT!MpAmsiNotify+0x140
04 0000007b`a638e8d0 00007ffc`682eb5fd MpOav!DllRegisterServer+0x1142
05 0000007b`a638e930 00007ffc`682e81cb amsi!CAmsiAntimalware::Notify+0xcd
06 0000007b`a638e9c0 00007ffb`95ec173b amsi!AmsiNotifyOperation+0xab
07 0000007b`a638ea10 00007ffb`f1b36cfb 0x00007ffb`95ec173b
08 0000007b`a638eae0 00007ffb`f1acfc4b System_Management_Automation!System.Management.Automation.AmsiUtils.WinReportContent+0xeb
09 0000007b`a638eb60 00007ffb`95964a7c System_Management_Automation!System.Management.Automation.MemberInvocationLoggingOps.LogMemberInvocation+0x27b
0a 0000007b`a638ec90 00007ffb`fa945cf6 0x00007ffb`95964a7c
0b 0000007b`a638ecf0 00007ffb`f1f61b9f System_Linq_Expressions!System.Dynamic.UpdateDelegates.UpdateAndExecute2<System.Type,object,object>+0x1f6 [/_/src/libraries/System.Linq.Expressions/src/System/Dynamic/UpdateDelegates.Generated.cs @ 268]
0c 0000007b`a638ed80 00007ffb`f1bac64e System_Management_Automation!System.Management.Automation.Interpreter.DynamicInstruction<System.Type,object,object>.Run+0xff
0d 0000007b`a638ee10 00007ffb`f1bac64e System_Management_Automation!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run+0x7e
0e 0000007b`a638ee90 00007ffb`f1bb20d3 System_Management_Automation!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run+0x7e
0f 0000007b`a638ef10 00007ffb`f1fa3e06 System_Management_Automation!System.Management.Automation.Interpreter.Interpreter.Run+0x33
10 0000007b`a638ef60 00007ffb`f1ad841d System_Management_Automation!System.Management.Automation.Interpreter.LightLambda.RunVoid1<System.Management.Automation.Language.FunctionContext>+0xc6
11 0000007b`a638efe0 00007ffb`f1ad7e0d System_Management_Automation!System.Management.Automation.DlrScriptCommandProcessor.RunClause+0x28d
12 0000007b`a638f070 00007ffb`f19fff15 System_Management_Automation!System.Management.Automation.DlrScriptCommandProcessor.Complete+0x11d
13 0000007b`a638f0e0 00007ffb`f1cbd0ed System_Management_Automation!System.Management.Automation.CommandProcessorBase.DoComplete+0x85
14 0000007b`a638f130 00007ffb`f1cbcdc9 System_Management_Automation!System.Management.Automation.Internal.PipelineProcessor.DoCompleteCore+0x9d
15 0000007b`a638f1b0 00007ffb`f1ac7eab System_Management_Automation!System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate+0xc9
16 0000007b`a638f230 00007ffb`9525b18e System_Management_Automation!System.Management.Automation.PipelineOps.InvokePipeline+0x33b
17 0000007b`a638f2d0 00007ffb`f1bac64e System_Management_Automation!System.Management.Automation.Interpreter.ActionCallInstruction<object,bool,System.Management.Automation.CommandParameterInternal[][],System.Management.Automation.Language.CommandBaseAst[],System.Management.Automation.CommandRedirection[][],System.Management.Automation.Language.FunctionContext>.Run+0x21e
18 0000007b`a638f380 00007ffb`f1bac64e System_Management_Automation!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run+0x7e
19 0000007b`a638f400 00007ffb`f1bb20d3 System_Management_Automation!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run+0x7e
1a 0000007b`a638f480 00007ffb`f1fa3e06 System_Management_Automation!System.Management.Automation.Interpreter.Interpreter.Run+0x33
1b 0000007b`a638f4d0 00007ffb`f1ad841d System_Management_Automation!System.Management.Automation.Interpreter.LightLambda.RunVoid1<System.Management.Automation.Language.FunctionContext>+0xc6
1c 0000007b`a638f550 00007ffb`f1ad7e0d System_Management_Automation!System.Management.Automation.DlrScriptCommandProcessor.RunClause+0x28d
1d 0000007b`a638f5e0 00007ffb`f19fff15 System_Management_Automation!System.Management.Automation.DlrScriptCommandProcessor.Complete+0x11d
1e 0000007b`a638f650 00007ffb`f1cbd0ed System_Management_Automation!System.Management.Automation.CommandProcessorBase.DoComplete+0x85
1f 0000007b`a638f6a0 00007ffb`f1cbcdc9 System_Management_Automation!System.Management.Automation.Internal.PipelineProcessor.DoCompleteCore+0x9d
20 0000007b`a638f720 00007ffb`f1bd1117 System_Management_Automation!System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate+0xc9
21 0000007b`a638f7a0 00007ffb`f1bd1923 System_Management_Automation!System.Management.Automation.Runspaces.LocalPipeline.InvokeHelper+0x507
22 0000007b`a638f850 00007ffb`f1bd2a7f System_Management_Automation!System.Management.Automation.Runspaces.LocalPipeline.InvokeThreadProc+0x113
23 0000007b`a638f8b0 00007ffb`f2c763cd System_Management_Automation!System.Management.Automation.Runspaces.PipelineThread.WorkerProc+0x2f
24 0000007b`a638f8e0 00007ffb`f4d6b8d3 System_Private_CoreLib!System.Threading.ExecutionContext.RunInternal+0x7d [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 179]
25 0000007b`a638f950 00007ffb`f4c3ebac coreclr!CallDescrWorkerInternal+0x83 [D:\a\_work\1\s\src\coreclr\vm\amd64\CallDescrWorkerAMD64.asm @ 100]
26 0000007b`a638f990 00007ffb`f4d57b93 coreclr!DispatchCallSimple+0x60 [D:\a\_work\1\s\src\coreclr\vm\callhelpers.cpp @ 221]
27 0000007b`a638fa20 00007ffb`f4cc4abd coreclr!ThreadNative::KickOffThread_Worker+0x63 [D:\a\_work\1\s\src\coreclr\vm\comsynchronizable.cpp @ 158]
28 (Inline Function) --------`-------- coreclr!ManagedThreadBase_DispatchInner+0xd [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7222]
29 0000007b`a638fa80 00007ffb`f4cc49d3 coreclr!ManagedThreadBase_DispatchMiddle+0x85 [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7266]
2a 0000007b`a638fb60 00007ffb`f4cc4b6e coreclr!ManagedThreadBase_DispatchOuter+0xab [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7425]
2b (Inline Function) --------`-------- coreclr!ManagedThreadBase_FullTransition+0x28 [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7470]
2c (Inline Function) --------`-------- coreclr!ManagedThreadBase::KickOff+0x28 [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7505]
2d 0000007b`a638fc00 00007ffc`738e257d coreclr!ThreadNative::KickOffThread+0x7e [D:\a\_work\1\s\src\coreclr\vm\comsynchronizable.cpp @ 230]
2e 0000007b`a638fc60 00007ffc`7508aa48 KERNEL32!BaseThreadInitThunk+0x1d
2f 0000007b`a638fc90 00000000`00000000 ntdll!RtlUserThreadStart+0x28
09 0000007b
a638eb60 00007ffb
95964a7c System_Management_Automation!System.Management.Automation.MemberInvocationLoggingOps.LogMemberInvocation+0x27b
Oh, LogMemberInvocation calls ArgumentToString here: https://github.com/PowerShell/PowerShell/blob/8ea1598964590b7551ffacda2d2007b94da1e5fa/src/System.Management.Automation/engine/runtime/Operations/MiscOps.cs#L3660
So does that mean the multi-megabyte base64 string goes via Anti-Malware Scan Interface to Windows Defender…? I guess that would be a sensible design. And then perhaps the Defender implementation of AMSI makes a few more copies of the string.
This PowerShell code would apparently log the whole AMSI scan request to the console if you set __PSDumpAMSILogContent=1
in the environment before you start PowerShell.
Why does the AMSI scan take that long, though… does it do useful work all that time, or does it get stuck somehow and give up after a timeout? Perhaps you could try with files of different sizes, graph how the file size affects the FromBase64String duration. If the duration stays the same, then that suggests there is a timeout.
I ran a test using variable size byte arrays with random payloads starting at 2 MiB in size and going up to 116 MiB in size. You can see that the duration required is linear, and also extremely slow. It takes 2 seconds to convert 32 MiB back to a byte array from a base64 string.
The same test conducted at 16 MiB intervals up to 256 MiB also shows a linear trend.
One final test at 32 MiB intervals up to 384 MiB shows a linear trend as well suggesting that there may be no upper boundary or timeout no matter how much data you ask PowerShell to convert from base64.
So does that mean the multi-megabyte base64 string goes via Anti-Malware Scan Interface to Windows Defender…? I guess that would be a sensible design.
Or perhaps you might be completely horrified by the idea of deep packet inspection of all arguments and no knowledge of whether that will be sent to 3rd parties. ( or any other party at all, to be honest )
I hoped the graph might show a lower boundary, because it could indicate a configuration error that could then be fixed to speed up the operation; for example, if the AMSI code running in-process were unable to contact the Defender service and spent some constant amount of time attempting that. Alas, the linear graph doesn't look like that's the case.
There may be ways to change the PowerShell script so that, even though it still triggers the suspicious content detector and causes an AMSI scan, the argument list being scanned does not include the base64 data and the scan finishes faster. But if such a workaround becomes commonly used, I suspect a future version of PowerShell will be changed to scan the data anyway.
If I am understanding this nonsense with AMSI correctly, then a solution would be to perform the Base64 translation in a compiled C# cmdlet. Given we are talking PowerShell it should be implemented using a pipeline with System.Security.Cryptography.ToBase64Transform
rather than dealing with massive strings. In this case the biggest string would be 64 characters
AMSI logging of method invocations was added as an experimental feature in https://github.com/PowerShell/PowerShell/pull/16496 and changed to non-experimental in https://github.com/PowerShell/PowerShell/pull/18041. I'm not sure it even uses the suspicious content detector; perhaps the difference between ToBase64String and FromBase64String is that ArgumentToString does not format the elements of a byte[] argument for AMSI, but passes a string argument through.
A slowdown was previously reported in https://github.com/PowerShell/PowerShell/issues/19431.
I am not seeing similar times
$bytes = new-object byte[] -ArgumentList @(,200554320)
$random = new-object Random
$random.NextBytes($bytes)
$now = Get-Date
$base64 = [System.Convert]::ToBase64String( $bytes )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
Took 442.478 ms on a little Intel(R) Core(TM) i3-10100Y CPU @ 1.30GHz 1.61 GHz running Windows 11 Pro
AMSI logging of method invocations was added as an experimental feature in #16496 and changed to non-experimental in #18041.
Where can I find documentation on what this actually does? I am personally horrified by the idea that anyone thinks they have the rights to log data that was private to a process without their knowledge.
When I say POWERSHELL_TELEMETRY_OPTOUT=1
I mean it!
I am not seeing similar times
@rhubarb-geek-nz, your script uses ToBase64String, not FromBase64String.
Where can I find documentation on what this actually does?
The best may be the documentation of the PSAMSIMethodInvocationLogging experimental feature in this old version: https://github.com/MicrosoftDocs/PowerShell-Docs/blob/793ed5c687e6c7b64565d1751c532eb1d7d84209/reference/docs-conceptual/learn/experimental-features.md#psamsimethodinvocationlogging
The "How AMSI helps" link in that documentation doesn't work on GitHub; use https://learn.microsoft.com/windows/win32/amsi/how-amsi-helps instead.
AMSI doesn't necessarily involve telemetry that would send the data off the machine. I don't know whether Windows Defender has telemetry for AMSI scans.
GitHubThe official PowerShell documentation sources. Contribute to MicrosoftDocs/PowerShell-Docs development by creating an account on GitHub.
As an application developer, you can actively participate in malware defense. Specifically, you can help protect your customers from dynamic script-based malware, and from non-traditional avenues of cyber attack.
There may be ways to change the PowerShell script so that, even though it still triggers the suspicious content detector and causes an AMSI scan, the argument list being scanned does not include the base64 data and the scan finishes faster.
Because ArgumentToString does not recognise the char[] type and returns only the type name, I think a [System.Convert]::FromBase64CharArray call should be much faster for AMSI to scan than [System.Convert]::FromBase64String. But who knows how long that will remain so.
@rhubarb-geek-nz, your script uses ToBase64String, not FromBase64String.
Thanks for that., well spotted. I have updated as
$bytes = new-object byte[] -ArgumentList @(,200554320)
$random = new-object Random
$random.NextBytes($bytes)
$now = Get-Date
$base64 = [System.Convert]::ToBase64String( $bytes )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
$bytes = $null
$now = Get-Date
$bytes = [System.Convert]::FromBase64String( $base64 )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
Running on Windows 11 I now get
480.4803 ms
65573.3259 ms
but running on WSL Debian 12 on same machine
556.7739 ms
2534.7992 ms
So this is an issue only on Windows
Sure, AMSI is part of Windows API and doesn't exist in Linux.
@daxian-dbw @SteveL-MSFT Who could review the method for performance? There is an issue with large arguments. It seems it makes no sense to send full large file content to amsi.
@daxian-dbw @SteveL-MSFT Who could review the method for performance?
It is using StringBuilder, so that is good
ArgumentToString()
does not do any escaping of the string values, so if the string contained >, <
or similar then you have the equivalent of SQL injection,
3 Using ($"<{value}>"); and StringBuilder, choose one mechanism of the other. eg
argsBuilder.Append('<').Append(value).Append('>');
StringBuilder has appenders for primitive types, so no need to convert those to a string and append when StringBuilder can append ints, longs and bools more efficiently.
No exception is made for any cryptographic material or SecureString
the equivalent of SQL injection
Public AMSI documentation does not specify any particular format for the string. I expect that escaping would do more harm than good, as it could prevent the AMSI provider from recognizing substrings that are known to be used by malware.
The argsBuilder.Append($"<{value}>")
call goes to StringBuilder.Append(StringBuilder.AppendInterpolatedStringHandler handler), which should be optimal already. Actual IL in LogMemberInvocation:
IL_0032: ldc.i4.2
IL_0033: ldc.i4.1
IL_0034: ldloc.s V_6
IL_0036: newobj instance void [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler::.ctor(int32,
int32,
class [System.Runtime]System.Text.StringBuilder)
IL_003b: stloc.s V_7
IL_003d: ldloca.s V_7
IL_003f: ldstr "<"
IL_0044: call instance void [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler::AppendLiteral(string)
IL_0049: ldloca.s V_7
IL_004b: ldloc.s V_5
IL_004d: call instance void [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler::AppendFormatted(string)
IL_0052: ldloca.s V_7
IL_0054: ldstr ">"
IL_0059: call instance void [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler::AppendLiteral(string)
IL_005e: ldloca.s V_7
IL_0060: callvirt instance class [System.Runtime]System.Text.StringBuilder [System.Runtime]System.Text.StringBuilder::Append(valuetype [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler&)
IL_0065: pop
It might be possible to save a little by replacing the subsequent string content = $"<{targetName}>.{name}({argsBuilder})";
formatting with more appends to the same StringBuilder; but the stack trace showed the AMSI call so I think most of the time is spent there rather than in managed-code string formatting.
I noticed this processing delay with [System.Convert]::FromBase64String
is also happening with several other CLR functions which fill and copy arrays, most notably [System.Random]::NextBytes
and [System.Array]::Copy
the first of which is an instance method on the System.Random
class. Both methods appear to have a linear duration increase as the size of the byte[]
increases.
I modified my test script to use [System.Convert]::FromBase64CharArray
which runs substantially faster than FromBase64String and in the process I noticed the NextBytes and Array::Copy methods were taking an increasingly long time to run. Here is my updated test script which outputs the timing results to a CSV file which can be pulled into Power BI for reporting.
# Create a random number generator and results collection
$rand = [System.Random]::new()
$testResults = @()
$iterations = 4 * 1024
$step = 1 * 1024
# Fill a byte[] with random data for testing
[byte[]]$randomData = [System.Array]::CreateInstance( [byte[]], $iterations * $step )
$rand.NextBytes( $randomData )
# Run a test in $step KiB increments
1..$iterations | ForEach-Object {
# Write a progress window
Write-Progress -Activity "Testing Base64 Conversion" -Status "Iteration $_ of $iterations" -PercentComplete (($_ / $iterations) * 99.9)
# Create a test result
$dataLength = ($_ * $step)
$result = [PSCustomObject]@{
Length = $dataLength
FillMs = 0
ToBase64Ms = 0
FromBase64Ms = 0
}
# Create an array of data
$start = Get-Date
[byte[]]$data = [System.Array]::CreateInstance( [byte[]], $dataLength )
[System.Array]::Copy( $randomData, 0, $data, 0, $dataLength )
$result.FillMs = ((Get-Date) - $start).TotalMilliseconds
# Run conversion tests and track timing
$start = Get-Date
$base64 = [System.Convert]::ToBase64String( $data )
$result.ToBase64Ms = ((Get-Date) - $start).TotalMilliseconds
$start = Get-Date
$chars = $base64.ToCharArray()
$bytes = [System.Convert]::FromBase64CharArray( $chars, 0, $chars.Length )
$result.FromBase64Ms = ((Get-Date) - $start).TotalMilliseconds
# Record results for output and graphing
$testResults += $result
}
# Output the results to CSV
$testResults | Export-Csv "base64-results.csv" -Force
but the stack trace showed the AMSI call so I think most of the time is spent there rather than in managed-code string formatting.
I don't think it is so much the matter of time but heap usage, the StringBuilder will have another copy of the base64 data, then the construction of content will do similar.
I suggest you can do it without formatting or StringBuilder and only allocate the memory for the final string once.
var argsBuilder = new string[5+args.Length*2];
int i = 0;
argsBuilder[i++] = "<";
argsBuilder[i++] = targetName;
argsBuilder[i++] = ">.";
argsBuilder[i++] = name;
if (args.Length > 0)
{
foreach (var arg in args)
{
argsBuilder[i++] = i==5 ? "(<" : ">, <";
argsBuilder[i++] = ArgumentToString(arg);
}
argsBuilder[i++] = ">)";
}
else
{
argsBuilder[i++] = "()";
}
string content = String.Join(null, argsBuilder);
Apart from the list of strings themselves in argsBuilder the final string is allocated only in String.Join() which determines the total length first.
Sure, AMSI is part of Windows API and doesn't exist in Linux.
Not quite. The logging is still done on Linux, whether AMSI exists or not. This was run on 7.4.2 on Linux
#!/usr/bin/env pwsh
$env:__PSDumpAMSILogContent='1'
$bytes = new-object byte[] -ArgumentList @(,200554320)
$random = new-object Random
$random.NextBytes($bytes)
Gives
=== Amsi notification report content ===
<System.Random>.NextBytes(<System.Byte[]>)
=== Amsi notification report success: False ===
My workaround is to use two cmdlets to do the Base64 conversions hence bypass the PSAMSIInvocationLogging
nonsense.
Original timings
218.4235 ms
64152.6876 ms
now
222.2945 ms
301.4766 ms
Code
#!/usr/bin/env pwsh
$env:__PSDumpAMSILogContent='1'
trap
{
throw $PSItem
}
$ErrorActionPreference = 'Stop'
$code = @"
using System;
using System.Management.Automation;
[Cmdlet("ConvertFrom", "Base64String")]
public class ConvertFromBase64String : PSCmdlet
{
[Parameter(Mandatory=true,ValueFromPipeline=true)]
public String InputString;
protected override void ProcessRecord()
{
WriteObject(System.Convert.FromBase64String(InputString));
}
}
[Cmdlet("ConvertTo", "Base64String")]
public class ConvertToBase64String : PSCmdlet
{
[Parameter(Mandatory=true,ValueFromPipeline=true)]
public byte[] InputObject;
protected override void ProcessRecord()
{
WriteObject(System.Convert.ToBase64String(InputObject));
}
}
"@
Add-Type $code -PassThru | ForEach-Object { Import-Module $_.Assembly }
Get-Command -Noun 'Base64String'
$bytes = new-object byte[] -ArgumentList @(,200554320)
$bytes.Length
$random = new-object Random
$random.NextBytes($bytes)
$now = Get-Date
$base64 = @(,$bytes) | ConvertTo-Base64String
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
$base64.Length
$bytes = $null
$now = Get-Date
$bytes = $base64 | ConvertFrom-Base64String
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
$bytes.Length
My workaround is to use two cmdlets to do the Base64 conversions hence bypass the
PSAMSIInvocationLogging
nonsense.
Thank you, this is very helpful. I was fiddling with using C# and Add-Type, but I noticed that making the call directly in C# via a static function still invokes the AMSI context. I will incorporate this into my PowerPass module to avoid the performance issue and excessive memory usage.
@chopinrlz, it isn't calls from C# that trigger an AMSI scan, it is .NET method calls from PowerShell code.
@rhubarb-geek-nz's workaround based on using C# code to implement cmdlets avoids this by letting PowerShell itself mediate the method calls.
Two asides:
$env:__PSDumpAMSILogContent='1'
isn't effective in-session; the env. var. must be set before calling PowerShell.
$bytes = new-object byte[] -ArgumentList @(,200554320)
can be simplified to
$bytes = [byte[]]::new(200554320)
, which also avoids an (invisible) [psobject]
wrapper that would cause an (unrelated) problem with a call such as ConvertTo-Base64String -InputObject $bytes
- see #21496
$env:__PSDumpAMSILogContent='1'
isn't effective in-session; the env. var. must be set before calling PowerShell.
Not actually true, it merely needs to be set before the first call to get DumpLogAMSIContent
, the environment variable is accessed by a lazy load. Eg #21492
private static readonly Lazy<bool> DumpLogAMSIContent = new Lazy<bool>(
() => {
object result = Environment.GetEnvironmentVariable("__PSDumpAMSILogContent");
if (result != null && LanguagePrimitives.TryConvertTo(result, out int value))
{
return value == 1;
}
return false;
}
);
$bytes = new-object byte[] -ArgumentList @(,200554320)
can be simplified to$bytes = [byte[]]::new(200554320)
, which also avoids an (invisible)[psobject]
wrapper that would cause an (unrelated) problem ....
Of course, how could we not have invisible, non-obvious problems in the simplest of code.
@rhubarb-geek-nz:
Not actually true, because for predictable diagnostic output you indeed do need the set the environment variable first, as evidenced by the following:
$null, 1 | % {
Write-Host ---
$env:__PSDumpAMSILogContent = $_
pwsh -noprofile { [byte[]]::new(0) }
}
$env:__PSDumpAMSILogContent = $null
Of course, how could we not have invisible, non-obvious problems in the simplest of code.
I assume this is pure sarcasm (which I do not endorse, but I empathize with the frustration I presume to underlie it); if there's an actual argument in there (beyond what #21496 expresses), please tell us.
Not actually true, because for predictable diagnostic output you indeed do need the set the environment variable first, as evidenced by the following:
The script was predictable because it had the "#!/usr/bin/env pwsh" at the start, the executable it bit set, and was designed to run directly from bash. It sets the environment variable after powershell has started but before the first reflection invocation.
if there's an actual argument in there, please tell us.
The frustration is because everytime you think you have found the solution with PowerShell, there is always another reason, case, exception or scenario where it breaks. As a user you don't have the tools to see all these problems because the very objects themselves play stupid games trying to pretend to be something they are not, or changing from what you thought it should have been. I can only assume I am not the target audience for this tool despite it supposedly being for system administrators, developers and IT professionals.
The script was predictable
No, it isn't predictable. My previous example stands. If the calling process doesn't have environment variable __PSDumpAMSILogContent
already set before invocation, a call to [byte[]]::new(0)
will not dump the diagnostic ASMI information, but it will do in the former case.
there is always another reason, case, exception or scenario where it breaks.
Again I empathize. But simply venting your frustration isn't the way forward.
In the case at hand I've (indirectly) pointed to the (ultimate) root cause of the underlying problem - #5579. I suggest channeling your frustration into constructive feedback - while being cognizant that such feedback may or may no be heard.
The script was predictable
No, it isn't predictable. My previous example stands. If the calling process doesn't have environment variable
__PSDumpAMSILogContent
already set before invocation, a call to[byte[]]::new(0)
will not dump the diagnostic ASMI information, but it will do in the former case.
Yes, you are absolutely right. I wasn't predictable because you might have been using PowerShell as your default shell to launch scripts. Whereas all other UNIX shell scripts start a new script with executable bit set in a new process, we are talking about PowerShell here. Sigh.
Perhaps a recommendation for running test scripts is "In a new process....", not "In whatever process with whatever indeterminate state you happen to have....."
I wasn't predictable because you might have been using PowerShell as your default shell
I was, but that is irrelevant: the only thing that matters is whether the calling process had a
See below.__PSDumpAMSILogContent
variable defined or not, so this equally applies to POSIX-compatible shells.
P.S.: @rhubarb-geek-nz:
I haven't looked into why the apparent attempt at honoring an in-process definition of the variable (private static readonly Lazy<bool> DumpLogAMSIContent
) doesn't work reliably.
DumpLogAMSIContent.Value
based on whether env. var. __PSDumpAMSILogContent
is defined (and set to 1
) then. In an interactive session, it is invariably the PSReadLine
module that is the first to make such a call ([Microsoft.PowerShell.PSConsoleReadLine]::ReadLine()
), so that subsequent in-process attempts to set __PSDumpAMSILogContent
are ineffective.However, generally speaking, in cases where PowerShell honors environment variables, they are expected to be set before PowerShell is launched.
$env:PSModulePath
is honored dynamically, on every access; the lazy once-per session initialization of DumpLogAMSIContent
is an unfortunate hybrid between static and dynamic behavior.the only thing that matters is whether the calling process had a
__PSDumpAMSILogContent
variable defined or not, so this equally applies to POSIX-compatible shells.
I beg to offer a different opinion...
$ ls -ld new.ps1
-rwxr-xr-x 1 me users 139 Apr 18 00:57 new.ps1
$ cat new.ps1
#!/usr/bin/env pwsh
$env:__PSDumpAMSILogContent='1'
$bytes1024 = new-object byte[] -ArgumentList @(,1024)
$bytes2048 = [byte[]]::new(2048)
Scenario A - The environment variable is not set in the calling process
$ echo $__PSDumpAMSILogContent
$ ./new.ps1
=== Amsi notification report content ===
<System.Byte[]>.new(<2048>)
=== Amsi notification report success: False ===
Scenario B - it is set to 0 in the calling process
$ __PSDumpAMSILogContent=0
$ echo $__PSDumpAMSILogContent
0
$ ./new.ps1
=== Amsi notification report content ===
<System.Byte[]>.new(<2048>)
=== Amsi notification report success: False ===
You're right - via the CLI (as implicitly used via a shebang-based executable shell script), the in-process setting is honored, if:
-File
or -Command
for execute-and-exit functionality.$env:__PSDumpAMSILogContent = 1
is set before any in-session .NET method calls occur from PowerShell code.A simpler demonstration: Start a pristine POSIX-compatible shell and run the following:
export -n __PSDumpAMSILogContent # ensure that the env. var. isn't defined.
# AMSI log output via env. var. defined BEFORE
__PSDumpAMSILogContent=1 pwsh -noprofile -c '$null = [byte[]]::new(2048)'
# !! Produces AMSI output too, because the environment variable - despite being set in-session - is
# !! set *before the first method call*.
pwsh -noprofile -c '$env:__PSDumpAMSILogContent = 1; $null = [byte[]]::new(2048)'
Note:
From inside PowerShell, an executable shell script with extension .ps1
is still executed in-process - and using a filename extension with an executable shell script is generally ill-advised.
The bigger picture here is: To make PowerShell configuration environment variables work predictably, set them before invoking PowerShell, irrespective of the invocation mechanism.
Rather than making a cmdlet for every .NET method you wish to call, you can simply put reflection in a single cmdlet.
$bytes = [byte[]]@(1,2,3)
$base64 = [string](Invoke-Reflection -Method ToBase64String -Type ([System.Convert]) -ArgumentList @(,$bytes))
Invoke-Reflection -Method FromBase64String -Type ([System.Convert]) -ArgumentList @(,$base64) | Format-Hex
and using a filename extension with an executable shell script is generally ill-advised.
Really? That is one I have not heard of.... eg
$ ls -ld *.sh
-rwxr-xr-x 1 github users 772 May 18 2023 debug.sh
-rw-r--r-- 1 github users 107 May 18 2023 download.sh
-rwxr-xr-x 1 github users 2094 May 18 2023 generate-icns.sh
-rwxr-xr-x 1 github users 7418 May 18 2023 install-powershell.sh
-rwxr-xr-x 1 github users 7307 May 18 2023 installpsh-amazonlinux.sh
-rwxr-xr-x 1 github users 9229 May 18 2023 installpsh-debian.sh
-rwxr-xr-x 1 github users 7791 May 18 2023 installpsh-gentoo.sh
-rw-r--r-- 1 github users 7533 May 18 2023 installpsh-mariner.sh
-rwxr-xr-x 1 github users 6483 May 18 2023 installpsh-osx.sh
-rwxr-xr-x 1 github users 6425 May 18 2023 installpsh-redhat.sh
-rwxr-xr-x 1 github users 9081 May 18 2023 installpsh-suse.sh
If you mean executable PowerShell scripts without the ps1 extension, we know how that ends up.
Another alternative is to do the reflection directly in PowerShell itself
ToBase64String
is
$method = ([System.Convert]).GetMethod('ToBase64String',[type[]]@(,([byte[]])))
$base64 = [string]($method.Invoke($null,@(,$bytes)))
FromBase64String
is
$method = ([System.Convert]).GetMethod('FromBase64String',[type[]]@(,([string])))
$bytes = $method.Invoke($null,@(,$base64))
Then the AMSI logging just looks like
=== Amsi notification report content ===
<System.Random>.NextBytes(<System.Byte[]>)
=== Amsi notification report success: False ===
=== Amsi notification report content ===
<System.RuntimeType>.GetMethod(<ToBase64String>, <System.Type[]>)
=== Amsi notification report success: False ===
=== Amsi notification report content ===
<System.Reflection.RuntimeMethodInfo>.Invoke(<null>, <System.Object[]>)
=== Amsi notification report success: False ===
=== Amsi notification report content ===
<System.RuntimeType>.GetMethod(<FromBase64String>, <System.Type[]>)
=== Amsi notification report success: False ===
=== Amsi notification report content ===
<System.Reflection.RuntimeMethodInfo>.Invoke(<null>, <System.Object[]>)
=== Amsi notification report success: False ===
Where the arguments are not dumped because all it prints is System.Object[]
Really? That is one I have not heard of...
Unfortunately, many ill-advised practices are common.
An executable shell script (using a shebang line) is an executable like any other, and there is no benefit to signaling to a caller that a given executable happens to be a shell script, which is (a) an implementation detail and (b) may lead users to believe that sh <script>.sh
should be used for invocation, which can fail if the script uses Bashisms, for instance.
With PowerShell, specifically, things get tricky (leaving the bug you mention aside), because, unlike analogous shell scripts for POSIX-compatible shells, an executable, shebang line-based .ps1
file is still executed in-process, with the potential to alter the session state. An executable, shebang line-based .ps1
file must therefore be designed with this in mind.
One without this extension consistently runs in a child process - albeit more slowly and at the expense of not having rich type support in the in- and output and the inability to pass array arguments and arguments that have no string-literal representations - but a PowerShell script that is designed to (also) run as a standalone executable should not rely on these features anyway.
I presume it is the latter limitations that explain why - at least in my perception - shebang line-based PowerShell scripts haven't really caught on and why bugs such as #21402 are still not fixed.
Prerequisites
Steps to reproduce
This was tested on PowerShell 7.4.2
NOTE: That if you test this with a newer version of the .NET 8.0 installer, you may have to modify the test script to pick the correct file for the test since the filename is hard coded on line 3.
The .NET 8.0 installer for Windows x64 is approximately 222 MB in size. Reading into memory and converting to base64 then converting back should require about 790 MB of RAM with all variables remaining in scope during the process and no garbage collection happening or object disposal happening. The observed behavior appears to be memory-leak related as the amount of memory used once the conversion eventually completes is about 3.4 GB of RAM. These data points can be see in the attached screen shots.
Expected behavior
Actual behavior
Error details
No response
Environment data
Visuals
Testing in PowerShell 7.4.2
Testing in PowerShell 5.1
PowerShell 7.4.2 Memory Usage
PowerShell 5.1 Memory Usage