PowerShell / PowerShell

PowerShell for every system!
https://microsoft.com/PowerShell
MIT License
43.55k stars 7.06k forks source link

[System.Convert]::FromBase64String causes memory leak with large strings #21473

Open chopinrlz opened 1 month ago

chopinrlz commented 1 month ago

Prerequisites

Steps to reproduce

This was tested on PowerShell 7.4.2

  1. Download the .NET 8.0 installer for Windows x64 to use as the test file, the direct link to this is here: https://dotnet.microsoft.com/en-us/download/dotnet/thank-you/sdk-8.0.204-windows-x64-installer
  2. Save this file into a folder on your computer somewhere
  3. Create a PowerShell script with the following contents in the same folder as the .NET 8.0 installer:
Get-Date
Write-Host "Creating Path to dotnet.exe test file"
$file = Join-Path -Path $PSScriptRoot -ChildPath "dotnet.exe"
Write-Host "Reading all file bytes into memory"
$bytes = [System.IO.File]::ReadAllBytes( $file )
Write-Host "Converting file bytes to base64 string"
$base64 = [System.Convert]::ToBase64String( $bytes )
Write-Host "Converting base64 string back to file bytes"
$bytes = [System.Convert]::FromBase64String( $base64 )
Write-Host "Test complete"
Get-Date

NOTE: That if you test this with a newer version of the .NET 8.0 installer, you may have to modify the test script to pick the correct file for the test since the filename is hard coded on line 3.

  1. Open a PowerShell window in the folder with the script and test file
  2. Run the PowerShell script
  3. Open Task Manager and observe the memory usage of PowerShell
  4. Note the time required to complete the conversion from Base64 and the usage of upwards of 3.5 GB of RAM to do so

The .NET 8.0 installer for Windows x64 is approximately 222 MB in size. Reading into memory and converting to base64 then converting back should require about 790 MB of RAM with all variables remaining in scope during the process and no garbage collection happening or object disposal happening. The observed behavior appears to be memory-leak related as the amount of memory used once the conversion eventually completes is about 3.4 GB of RAM. These data points can be see in the attached screen shots.

Expected behavior

When you run the same script in PowerShell 7 and Windows PowerShell 5.1, you see two very different behaviors:

In PowerShell 7.4.2, the time to complete is 82 seconds and memory used is 3.4 GB
In PowerShell 5.1, the time to complete is 7 seconds and memory used is 1.0 GB

This suggests there is an error in the PowerShell 7.4.2 / .NET 8.0 implementation.

Actual behavior

In PowerShell 7.4.2, the time to complete is 82 seconds and memory used is 3.4 GB.

Error details

No response

Environment data

Name                           Value
----                           -----
PSVersion                      7.4.2
PSEdition                      Core
GitCommitId                    7.4.2
OS                             Microsoft Windows 10.0.22631
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

Visuals

Testing in PowerShell 7.4.2

ps7-test

Testing in PowerShell 5.1

ps5-test

PowerShell 7.4.2 Memory Usage

ps7-mem

PowerShell 5.1 Memory Usage

ps5-mem
rhubarb-geek-nz commented 1 month ago

Hi,

Given all of your file and data operations are performed directly with the .NET API I suggest this is not a PowerShell problem, it looks like it is a .NET issue.

Is your issue that PowerShell is not running the garbage collector? Does running

[System.GC]::Collect()

make any difference?

Also be aware that managed memory runtimes often take memory from the OS in order to allocate objects but never return it. So the objects may have been released/disposed/freed so the actual CLR heap is free but the memory has not been returned to the OS. This is completely normal as managed runtimes (CLR, JVM etc) assume that if they needed it once they are likely to need it again so no point giving it back to the OS.

You would need to look for tools to examine the state of the CLR heap within the process rather than external process monitoring tools.

When you have an operation that you know is memory intensive then two other options are available,

  1. run the operation in a separate process, in PowerShell all you need to do is run it in a Job and PowerShell will do the rest.
  2. stream the operation and read and write chunks of data rather than holding it all in memory.

I hope that helps.

jborean93 commented 1 month ago

I cannot replicate the slowness you see as 7.4.2 takes less than 5 seconds for me and is in fact faster than WinPS but I do notice the large memory usage. My guess is that it is not only are now storing the 2 byte arrays (raw file data and the decoded base64 string) but also the base64 string is all allocated on the heap as part of the operation. Potentially WinPS/.NET Framework is more aggressive in reusing the array values but as per the above the CLR could be allocating the memory and just never freeing it so it can more efficient reuse the memory in the future.

Putting aside the above comment you can more efficiently base64 encode bytes by streaming it rather than reading all the input bytes into memory.

Function ConvertTo-Base64String {
    [OutputType([string])]
    [CmdletBinding()]
    param (
        [Parameter(Mandatory)]
        [string]$Path
    )

    $fs = $cryptoStream = $sr = $null
    try {
        $fs = [System.IO.File]::OpenRead($Path)
        $cryptoStream = [System.Security.CryptoGraphy.CryptoStream]::new(
            $fs,
            [System.Security.Cryptography.ToBase64Transform]::new(),
            [System.Security.Cryptography.CryptoStreamMode]::Read)
        $sr = [System.IO.StreamReader]::new($cryptoStream, [System.Text.Encoding]::ASCII)
        $sr.ReadToEnd()
    }
    finally {
        ${sr}?.Dispose()
        ${cryptoStream}?.Dispose()
        ${fs}?.Dispose()
    }
}

This will stream the raw bytes from the source file stream and produce the final output string. If you are storing this string into a file then you could optimize it further by streaming the output base64 CryptoStream to a file avoiding having to store all the data in PowerShell.

If you do need to store the base64 string as an object in PowerShell keep in mind this means you not only have to store the inflated size that base64 uses (($length / 3) * 4) but each char of the string takes two bytes so at a minimum you are looking at around 600MB for the dotnet installer. Using the above function I see the memory usage sits around 1.2GB which is less than WinPS (about 1.4GB). While this is still more than the ~600MB there could be other factors in place here like the CLR allocating more memory than strictly needs in anticipation of future needs or some other reason.

chopinrlz commented 1 month ago

Thank you both for your responses and suggestions for optimization. I did reply to the Issue cross-posted in the dotnet/runtime project which you can see here https://github.com/dotnet/runtime/issues/101061#issuecomment-2058103434

I ran [GC]::Collect() after running the test script in both WinPS and PS7 and noticed that in PS7 almost none of the used memory was released back to the operating system, which seemed suspicious as in WinPS almost all memory is released after the test.

UPDATE: Confirmed below that the FromBase64String delay and excessive memory usage only occurs through PowerShell 7, but not through a .NET console app. ~This definitely appears to be a .NET run-time issue and not a PowerShell issue, however I wanted to post it here as well as over there for visibility in case others encounter this oddity while building PowerShell modules (which is how I stumbled upon it).~

I am aware that storing a 200 MB file in memory as base64 text is wildly inefficient, and is not the intention of how my PowerPass module should be used (which is how I discovered this in the first place), but since I stumbled upon this unexpected behavior I thought it prudent to at least report it.

But again, I appreciate all of the comments and feedback here, especially the suggestions for optimization techniques.

237dmitry commented 1 month ago

In PowerShell 7.4.2, the time to complete is 82 seconds and memory used is 3.4 GB.

Yes, the same memory usage, but time to complete is about 1.2 seconds

chopinrlz commented 1 month ago

I retested the following updated script on my desktop PC. A Ryzen 5800X with 128 GB of RAM and PCIe Gen4 NVMe storage. The test ran much faster as expected, but the memory usage still remains high.

Even after invoking [GC]::Collect() around 3.2 GB of RAM still remains utilized by the pwsh process.

Reading the 222 MB file into memory takes 0.06 seconds and converting it to base64 takes 0.22 seconds and uses 845 MB of RAM across both operations as expected. The last operation [System.Convert]::FromBase64String uses 2.6 GB of RAM alone and takes 17 seconds.

ps7-retest-timings

I'll cross-post this in the dotnet/runtime issue. Thank you all for the feedback.

Updated test script:

$name = "random.bin"

$start = Get-Date

Write-Host "Creating Path to $name test file: " -NoNewline
$now = Get-Date
$file = Join-Path -Path $PSScriptRoot -ChildPath $name
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

Write-Host "Reading all file bytes into memory: " -NoNewline
$now = Get-Date
$bytes = [System.IO.File]::ReadAllBytes( $file )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

Write-Host "Converting file bytes to base64 string: " -NoNewline
$now = Get-Date
$base64 = [System.Convert]::ToBase64String( $bytes )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

Write-Host "Converting base64 string back to file bytes: " -NoNewline
$now = Get-Date
$bytes = [System.Convert]::FromBase64String( $base64 )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

Write-Host "Test complete"

Write-Host "Total duration: $(((Get-Date) - $start).TotalMilliseconds) ms"
chopinrlz commented 1 month ago

I just tested this same implementation using a C# console application for the dotnet/runtime team and the issues does NOT occur when running in a console application against .NET 8 on the latest SDK on Windows 11 Professional. My test results and C# code are here: https://github.com/dotnet/runtime/issues/101061#issuecomment-2059277320

It seems that this is actually a memory leak in the PowerShell runtime for some reason. The dotnet/runtime crew was asking if the [System.Convert]::FromBase64String function was being replaced with something else by PowerShell, but since I haven't tweaked my PowerShell installation, I can't imagine what could be doing that. Plus, this issue happens on multiple PCs, my desktop and my laptop, and only within PowerShell 7.4.2.

I'm assuming PowerShell 7.4.2 is using .NET 8.0 under the hood. Does it ship with its own .NET run-time or does it rely on the run-time installed on the system?

chopinrlz commented 1 month ago

For reference, the random.bin test file I am using is exactly 233,420,544 bytes in length. I generated it with this script:

param(
    [int]
    $Size
)
$blockSize = 256
$rand = [System.Random]::new()
$total = 0
[byte[]]$data = [System.Array]::CreateInstance( [byte[]], $blockSize )
$path = Join-Path -Path $PSScriptRoot -ChildPath "random.bin"
if( Test-Path $path ) {
    Remove-Item -Path $path -Force
}
$file = [System.IO.File]::OpenWrite( $path )
while( $total -lt $Size ) {
    $rand.NextBytes( $data )
    $file.Write( $data, 0, $data.Length )
    $total += $blockSize
}
$file.Flush()
$file.Close()
KalleOlaviNiemitalo commented 1 month ago

I'm assuming PowerShell 7.4.2 is using .NET 8.0 under the hood. Does it ship with its own .NET run-time or does it rely on the run-time installed on the system?

With PowerShell 7.4.2 installed from Microsoft Store, Get-ChildItem $PSHOME shows files like coreclr.dll; I think that means PowerShell has its own copy of the .NET Runtime.

[System.Runtime.InteropServices.RuntimeInformation]::FrameworkDescription is ".NET 8.0.4".

mklement0 commented 1 month ago

A couple more data points regarding the unexpected slowdown:

KalleOlaviNiemitalo commented 1 month ago

In src/System.Management.Automation/engine/runtime/CompiledScriptBlock.cs, there is class SuspiciousContentChecker, which attempts to detect "suspicious strings" such as FromBase64String. That then apparently causes PowerShell to log some ETW event. I wonder if Windows Defender monitors those events and then spends time investigating the process.

iSazonov commented 1 month ago

I would like to remind that earlier we observed slow pwsh operations with files due to antivirus.

chopinrlz commented 1 month ago

I'm assuming PowerShell 7.4.2 is using .NET 8.0 under the hood. Does it ship with its own .NET run-time or does it rely on the run-time installed on the system?

With PowerShell 7.4.2 installed from Microsoft Store, Get-ChildItem $PSHOME shows files like coreclr.dll; I think that means PowerShell has its own copy of the .NET Runtime.

[System.Runtime.InteropServices.RuntimeInformation]::FrameworkDescription is ".NET 8.0.4".

I am also seeing .NET 8.0.4 for the FrameworkDescription on my PowerShell 7 install which I setup from the MSI downloaded from Github. Assembly file versions are 8.0.424.16909.

chopinrlz commented 1 month ago

I would like to remind that earlier we observed slow pwsh operations with files due to antivirus.

The slow operation is the call to [System.Convert]::FromBase64String which occurs after the file is loaded from disk into memory. On my desktop, this operation takes 17 seconds. The file load operation from disk takes 0.06 seconds.

Also, this only happens when doing this in PowerShell 7.4.2. Running this same test in a C# console application takes under 1 second.

console-app-test

chopinrlz commented 1 month ago

In src/System.Management.Automation/engine/runtime/CompiledScriptBlock.cs, there is class SuspiciousContentChecker, which attempts to detect "suspicious strings" such as FromBase64String. That then apparently causes PowerShell to log some ETW event. I wonder if Windows Defender monitors those events and then spends time investigating the process.

I added a Program Setting for pwsh.exe into Exploit Protection under App & Browser Control and disabled all protections. Rerunning the test resulted in the same duration. 17 seconds and 3.4 GB of RAM usage. Is there another place I can check in Windows Security to prevent Defender from watching pwsh.exe?

KalleOlaviNiemitalo commented 1 month ago

The excess memory consumption of the pwsh process suggests that there is extra code running within the process; rather than antivirus software examining it from the outside (from another process or from a kernel-mode driver).

During the slow FromBase64String operation, is the pwsh process consuming a lot of processor time (one thread's worth)?

Can you attach to an unmanaged-code debugger to the process during FromBase64String and get a stack trace of the thread with the most CPU time? (!runaway, k, Thread Syntax)

chopinrlz commented 1 month ago

Here is a stack trace of the thread with the most CPU time. I used WinDbg and broke process execution in the middle of the FromBase64String operation. You can see some fun stuff at the top.

0:009> ~21 k
 # Child-SP          RetAddr               Call Site
00 0000007b`a638e6d8 00007ffc`5b790a10     MPCLIENT!MpUpdateServicePingRpc+0x7b676
01 0000007b`a638e6e0 00007ffc`5b7cfde4     MPCLIENT!MpTelemetryUpdateUserConsent+0x190
02 0000007b`a638e720 00007ffc`5b7c5db0     MPCLIENT!MpConveyUserChoiceForSampleList+0x704
03 0000007b`a638e800 00007ffc`68253f92     MPCLIENT!MpAmsiNotify+0x140
04 0000007b`a638e8d0 00007ffc`682eb5fd     MpOav!DllRegisterServer+0x1142
05 0000007b`a638e930 00007ffc`682e81cb     amsi!CAmsiAntimalware::Notify+0xcd
06 0000007b`a638e9c0 00007ffb`95ec173b     amsi!AmsiNotifyOperation+0xab
07 0000007b`a638ea10 00007ffb`f1b36cfb     0x00007ffb`95ec173b
08 0000007b`a638eae0 00007ffb`f1acfc4b     System_Management_Automation!System.Management.Automation.AmsiUtils.WinReportContent+0xeb
09 0000007b`a638eb60 00007ffb`95964a7c     System_Management_Automation!System.Management.Automation.MemberInvocationLoggingOps.LogMemberInvocation+0x27b
0a 0000007b`a638ec90 00007ffb`fa945cf6     0x00007ffb`95964a7c
0b 0000007b`a638ecf0 00007ffb`f1f61b9f     System_Linq_Expressions!System.Dynamic.UpdateDelegates.UpdateAndExecute2<System.Type,object,object>+0x1f6 [/_/src/libraries/System.Linq.Expressions/src/System/Dynamic/UpdateDelegates.Generated.cs @ 268] 
0c 0000007b`a638ed80 00007ffb`f1bac64e     System_Management_Automation!System.Management.Automation.Interpreter.DynamicInstruction<System.Type,object,object>.Run+0xff
0d 0000007b`a638ee10 00007ffb`f1bac64e     System_Management_Automation!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run+0x7e
0e 0000007b`a638ee90 00007ffb`f1bb20d3     System_Management_Automation!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run+0x7e
0f 0000007b`a638ef10 00007ffb`f1fa3e06     System_Management_Automation!System.Management.Automation.Interpreter.Interpreter.Run+0x33
10 0000007b`a638ef60 00007ffb`f1ad841d     System_Management_Automation!System.Management.Automation.Interpreter.LightLambda.RunVoid1<System.Management.Automation.Language.FunctionContext>+0xc6
11 0000007b`a638efe0 00007ffb`f1ad7e0d     System_Management_Automation!System.Management.Automation.DlrScriptCommandProcessor.RunClause+0x28d
12 0000007b`a638f070 00007ffb`f19fff15     System_Management_Automation!System.Management.Automation.DlrScriptCommandProcessor.Complete+0x11d
13 0000007b`a638f0e0 00007ffb`f1cbd0ed     System_Management_Automation!System.Management.Automation.CommandProcessorBase.DoComplete+0x85
14 0000007b`a638f130 00007ffb`f1cbcdc9     System_Management_Automation!System.Management.Automation.Internal.PipelineProcessor.DoCompleteCore+0x9d
15 0000007b`a638f1b0 00007ffb`f1ac7eab     System_Management_Automation!System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate+0xc9
16 0000007b`a638f230 00007ffb`9525b18e     System_Management_Automation!System.Management.Automation.PipelineOps.InvokePipeline+0x33b
17 0000007b`a638f2d0 00007ffb`f1bac64e     System_Management_Automation!System.Management.Automation.Interpreter.ActionCallInstruction<object,bool,System.Management.Automation.CommandParameterInternal[][],System.Management.Automation.Language.CommandBaseAst[],System.Management.Automation.CommandRedirection[][],System.Management.Automation.Language.FunctionContext>.Run+0x21e
18 0000007b`a638f380 00007ffb`f1bac64e     System_Management_Automation!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run+0x7e
19 0000007b`a638f400 00007ffb`f1bb20d3     System_Management_Automation!System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run+0x7e
1a 0000007b`a638f480 00007ffb`f1fa3e06     System_Management_Automation!System.Management.Automation.Interpreter.Interpreter.Run+0x33
1b 0000007b`a638f4d0 00007ffb`f1ad841d     System_Management_Automation!System.Management.Automation.Interpreter.LightLambda.RunVoid1<System.Management.Automation.Language.FunctionContext>+0xc6
1c 0000007b`a638f550 00007ffb`f1ad7e0d     System_Management_Automation!System.Management.Automation.DlrScriptCommandProcessor.RunClause+0x28d
1d 0000007b`a638f5e0 00007ffb`f19fff15     System_Management_Automation!System.Management.Automation.DlrScriptCommandProcessor.Complete+0x11d
1e 0000007b`a638f650 00007ffb`f1cbd0ed     System_Management_Automation!System.Management.Automation.CommandProcessorBase.DoComplete+0x85
1f 0000007b`a638f6a0 00007ffb`f1cbcdc9     System_Management_Automation!System.Management.Automation.Internal.PipelineProcessor.DoCompleteCore+0x9d
20 0000007b`a638f720 00007ffb`f1bd1117     System_Management_Automation!System.Management.Automation.Internal.PipelineProcessor.SynchronousExecuteEnumerate+0xc9
21 0000007b`a638f7a0 00007ffb`f1bd1923     System_Management_Automation!System.Management.Automation.Runspaces.LocalPipeline.InvokeHelper+0x507
22 0000007b`a638f850 00007ffb`f1bd2a7f     System_Management_Automation!System.Management.Automation.Runspaces.LocalPipeline.InvokeThreadProc+0x113
23 0000007b`a638f8b0 00007ffb`f2c763cd     System_Management_Automation!System.Management.Automation.Runspaces.PipelineThread.WorkerProc+0x2f
24 0000007b`a638f8e0 00007ffb`f4d6b8d3     System_Private_CoreLib!System.Threading.ExecutionContext.RunInternal+0x7d [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 179] 
25 0000007b`a638f950 00007ffb`f4c3ebac     coreclr!CallDescrWorkerInternal+0x83 [D:\a\_work\1\s\src\coreclr\vm\amd64\CallDescrWorkerAMD64.asm @ 100] 
26 0000007b`a638f990 00007ffb`f4d57b93     coreclr!DispatchCallSimple+0x60 [D:\a\_work\1\s\src\coreclr\vm\callhelpers.cpp @ 221] 
27 0000007b`a638fa20 00007ffb`f4cc4abd     coreclr!ThreadNative::KickOffThread_Worker+0x63 [D:\a\_work\1\s\src\coreclr\vm\comsynchronizable.cpp @ 158] 
28 (Inline Function) --------`--------     coreclr!ManagedThreadBase_DispatchInner+0xd [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7222] 
29 0000007b`a638fa80 00007ffb`f4cc49d3     coreclr!ManagedThreadBase_DispatchMiddle+0x85 [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7266] 
2a 0000007b`a638fb60 00007ffb`f4cc4b6e     coreclr!ManagedThreadBase_DispatchOuter+0xab [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7425] 
2b (Inline Function) --------`--------     coreclr!ManagedThreadBase_FullTransition+0x28 [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7470] 
2c (Inline Function) --------`--------     coreclr!ManagedThreadBase::KickOff+0x28 [D:\a\_work\1\s\src\coreclr\vm\threads.cpp @ 7505] 
2d 0000007b`a638fc00 00007ffc`738e257d     coreclr!ThreadNative::KickOffThread+0x7e [D:\a\_work\1\s\src\coreclr\vm\comsynchronizable.cpp @ 230] 
2e 0000007b`a638fc60 00007ffc`7508aa48     KERNEL32!BaseThreadInitThunk+0x1d
2f 0000007b`a638fc90 00000000`00000000     ntdll!RtlUserThreadStart+0x28
KalleOlaviNiemitalo commented 1 month ago

09 0000007ba638eb60 00007ffb95964a7c System_Management_Automation!System.Management.Automation.MemberInvocationLoggingOps.LogMemberInvocation+0x27b

Oh, LogMemberInvocation calls ArgumentToString here: https://github.com/PowerShell/PowerShell/blob/8ea1598964590b7551ffacda2d2007b94da1e5fa/src/System.Management.Automation/engine/runtime/Operations/MiscOps.cs#L3660

So does that mean the multi-megabyte base64 string goes via Anti-Malware Scan Interface to Windows Defender…? I guess that would be a sensible design. And then perhaps the Defender implementation of AMSI makes a few more copies of the string.

This PowerShell code would apparently log the whole AMSI scan request to the console if you set __PSDumpAMSILogContent=1 in the environment before you start PowerShell.

Why does the AMSI scan take that long, though… does it do useful work all that time, or does it get stuck somehow and give up after a timeout? Perhaps you could try with files of different sizes, graph how the file size affects the FromBase64String duration. If the duration stays the same, then that suggests there is a timeout.

chopinrlz commented 1 month ago

I ran a test using variable size byte arrays with random payloads starting at 2 MiB in size and going up to 116 MiB in size. You can see that the duration required is linear, and also extremely slow. It takes 2 seconds to convert 32 MiB back to a byte array from a base64 string.

data-size-chart

The same test conducted at 16 MiB intervals up to 256 MiB also shows a linear trend.

data-size-chart

One final test at 32 MiB intervals up to 384 MiB shows a linear trend as well suggesting that there may be no upper boundary or timeout no matter how much data you ask PowerShell to convert from base64.

data-size-chart

rhubarb-geek-nz commented 1 month ago

So does that mean the multi-megabyte base64 string goes via Anti-Malware Scan Interface to Windows Defender…? I guess that would be a sensible design.

Or perhaps you might be completely horrified by the idea of deep packet inspection of all arguments and no knowledge of whether that will be sent to 3rd parties. ( or any other party at all, to be honest )

KalleOlaviNiemitalo commented 1 month ago

I hoped the graph might show a lower boundary, because it could indicate a configuration error that could then be fixed to speed up the operation; for example, if the AMSI code running in-process were unable to contact the Defender service and spent some constant amount of time attempting that. Alas, the linear graph doesn't look like that's the case.

There may be ways to change the PowerShell script so that, even though it still triggers the suspicious content detector and causes an AMSI scan, the argument list being scanned does not include the base64 data and the scan finishes faster. But if such a workaround becomes commonly used, I suspect a future version of PowerShell will be changed to scan the data anyway.

rhubarb-geek-nz commented 1 month ago

If I am understanding this nonsense with AMSI correctly, then a solution would be to perform the Base64 translation in a compiled C# cmdlet. Given we are talking PowerShell it should be implemented using a pipeline with System.Security.Cryptography.ToBase64Transform rather than dealing with massive strings. In this case the biggest string would be 64 characters

KalleOlaviNiemitalo commented 1 month ago

AMSI logging of method invocations was added as an experimental feature in https://github.com/PowerShell/PowerShell/pull/16496 and changed to non-experimental in https://github.com/PowerShell/PowerShell/pull/18041. I'm not sure it even uses the suspicious content detector; perhaps the difference between ToBase64String and FromBase64String is that ArgumentToString does not format the elements of a byte[] argument for AMSI, but passes a string argument through.

A slowdown was previously reported in https://github.com/PowerShell/PowerShell/issues/19431.

rhubarb-geek-nz commented 1 month ago

I am not seeing similar times

$bytes = new-object byte[] -ArgumentList @(,200554320)
$random = new-object Random
$random.NextBytes($bytes)
$now = Get-Date
$base64 = [System.Convert]::ToBase64String( $bytes )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

Took 442.478 ms on a little Intel(R) Core(TM) i3-10100Y CPU @ 1.30GHz 1.61 GHz running Windows 11 Pro

rhubarb-geek-nz commented 1 month ago

AMSI logging of method invocations was added as an experimental feature in #16496 and changed to non-experimental in #18041.

Where can I find documentation on what this actually does? I am personally horrified by the idea that anyone thinks they have the rights to log data that was private to a process without their knowledge.

When I say POWERSHELL_TELEMETRY_OPTOUT=1 I mean it!

KalleOlaviNiemitalo commented 1 month ago

I am not seeing similar times

@rhubarb-geek-nz, your script uses ToBase64String, not FromBase64String.

Where can I find documentation on what this actually does?

The best may be the documentation of the PSAMSIMethodInvocationLogging experimental feature in this old version: https://github.com/MicrosoftDocs/PowerShell-Docs/blob/793ed5c687e6c7b64565d1751c532eb1d7d84209/reference/docs-conceptual/learn/experimental-features.md#psamsimethodinvocationlogging

The "How AMSI helps" link in that documentation doesn't work on GitHub; use https://learn.microsoft.com/windows/win32/amsi/how-amsi-helps instead.

AMSI doesn't necessarily involve telemetry that would send the data off the machine. I don't know whether Windows Defender has telemetry for AMSI scans.

GitHub
PowerShell-Docs/reference/docs-conceptual/learn/experimental-features.md at 793ed5c687e6c7b64565d1751c532eb1d7d84209 · MicrosoftDocs/PowerShell-Docs
The official PowerShell documentation sources. Contribute to MicrosoftDocs/PowerShell-Docs development by creating an account on GitHub.
How AMSI helps you defend against malware - Win32 apps
As an application developer, you can actively participate in malware defense. Specifically, you can help protect your customers from dynamic script-based malware, and from non-traditional avenues of cyber attack.
KalleOlaviNiemitalo commented 1 month ago

There may be ways to change the PowerShell script so that, even though it still triggers the suspicious content detector and causes an AMSI scan, the argument list being scanned does not include the base64 data and the scan finishes faster.

Because ArgumentToString does not recognise the char[] type and returns only the type name, I think a [System.Convert]::FromBase64CharArray call should be much faster for AMSI to scan than [System.Convert]::FromBase64String. But who knows how long that will remain so.

rhubarb-geek-nz commented 1 month ago

@rhubarb-geek-nz, your script uses ToBase64String, not FromBase64String.

Thanks for that., well spotted. I have updated as

$bytes = new-object byte[] -ArgumentList @(,200554320)
$random = new-object Random
$random.NextBytes($bytes)
$now = Get-Date
$base64 = [System.Convert]::ToBase64String( $bytes )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"
$bytes = $null
$now = Get-Date
$bytes = [System.Convert]::FromBase64String( $base64 )
Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

Running on Windows 11 I now get

 480.4803 ms
 65573.3259 ms

but running on WSL Debian 12 on same machine

 556.7739 ms
 2534.7992 ms

So this is an issue only on Windows

KalleOlaviNiemitalo commented 1 month ago

Sure, AMSI is part of Windows API and doesn't exist in Linux.

iSazonov commented 1 month ago

@daxian-dbw @SteveL-MSFT Who could review the method for performance? There is an issue with large arguments. It seems it makes no sense to send full large file content to amsi.

https://github.com/PowerShell/PowerShell/blob/8ea1598964590b7551ffacda2d2007b94da1e5fa/src/System.Management.Automation/engine/runtime/Operations/MiscOps.cs#L3651-L3668

rhubarb-geek-nz commented 1 month ago

@daxian-dbw @SteveL-MSFT Who could review the method for performance?

  1. It is using StringBuilder, so that is good

  2. ArgumentToString() does not do any escaping of the string values, so if the string contained >, < or similar then you have the equivalent of SQL injection,

3 Using ($"<{value}>"); and StringBuilder, choose one mechanism of the other. eg

argsBuilder.Append('<').Append(value).Append('>');
  1. StringBuilder has appenders for primitive types, so no need to convert those to a string and append when StringBuilder can append ints, longs and bools more efficiently.

  2. No exception is made for any cryptographic material or SecureString

KalleOlaviNiemitalo commented 1 month ago

the equivalent of SQL injection

Public AMSI documentation does not specify any particular format for the string. I expect that escaping would do more harm than good, as it could prevent the AMSI provider from recognizing substrings that are known to be used by malware.

The argsBuilder.Append($"<{value}>") call goes to StringBuilder.Append(StringBuilder.AppendInterpolatedStringHandler handler), which should be optimal already. Actual IL in LogMemberInvocation:

    IL_0032:  ldc.i4.2
    IL_0033:  ldc.i4.1
    IL_0034:  ldloc.s    V_6
    IL_0036:  newobj     instance void [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler::.ctor(int32,
                                                                                                                        int32,
                                                                                                                        class [System.Runtime]System.Text.StringBuilder)
    IL_003b:  stloc.s    V_7
    IL_003d:  ldloca.s   V_7
    IL_003f:  ldstr      "<"
    IL_0044:  call       instance void [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler::AppendLiteral(string)
    IL_0049:  ldloca.s   V_7
    IL_004b:  ldloc.s    V_5
    IL_004d:  call       instance void [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler::AppendFormatted(string)
    IL_0052:  ldloca.s   V_7
    IL_0054:  ldstr      ">"
    IL_0059:  call       instance void [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler::AppendLiteral(string)
    IL_005e:  ldloca.s   V_7
    IL_0060:  callvirt   instance class [System.Runtime]System.Text.StringBuilder [System.Runtime]System.Text.StringBuilder::Append(valuetype [System.Runtime]System.Text.StringBuilder/AppendInterpolatedStringHandler&)
    IL_0065:  pop

It might be possible to save a little by replacing the subsequent string content = $"<{targetName}>.{name}({argsBuilder})"; formatting with more appends to the same StringBuilder; but the stack trace showed the AMSI call so I think most of the time is spent there rather than in managed-code string formatting.

chopinrlz commented 1 month ago

I noticed this processing delay with [System.Convert]::FromBase64String is also happening with several other CLR functions which fill and copy arrays, most notably [System.Random]::NextBytes and [System.Array]::Copy the first of which is an instance method on the System.Random class. Both methods appear to have a linear duration increase as the size of the byte[] increases.

I modified my test script to use [System.Convert]::FromBase64CharArray which runs substantially faster than FromBase64String and in the process I noticed the NextBytes and Array::Copy methods were taking an increasingly long time to run. Here is my updated test script which outputs the timing results to a CSV file which can be pulled into Power BI for reporting.

# Create a random number generator and results collection
$rand = [System.Random]::new()
$testResults = @()
$iterations = 4 * 1024
$step = 1 * 1024

# Fill a byte[] with random data for testing
[byte[]]$randomData = [System.Array]::CreateInstance( [byte[]], $iterations * $step )
$rand.NextBytes( $randomData )

# Run a test in $step KiB increments
1..$iterations | ForEach-Object {

    # Write a progress window
    Write-Progress -Activity "Testing Base64 Conversion" -Status "Iteration $_ of $iterations" -PercentComplete (($_ / $iterations) * 99.9)

    # Create a test result
    $dataLength = ($_ * $step)
    $result = [PSCustomObject]@{
        Length = $dataLength
        FillMs = 0
        ToBase64Ms = 0
        FromBase64Ms = 0
    }

    # Create an array of data
    $start = Get-Date
    [byte[]]$data = [System.Array]::CreateInstance( [byte[]], $dataLength )
    [System.Array]::Copy( $randomData, 0, $data, 0, $dataLength )
    $result.FillMs = ((Get-Date) - $start).TotalMilliseconds

    # Run conversion tests and track timing
    $start = Get-Date
    $base64 = [System.Convert]::ToBase64String( $data )
    $result.ToBase64Ms = ((Get-Date) - $start).TotalMilliseconds

    $start = Get-Date
    $chars = $base64.ToCharArray()
    $bytes = [System.Convert]::FromBase64CharArray( $chars, 0, $chars.Length )
    $result.FromBase64Ms = ((Get-Date) - $start).TotalMilliseconds

    # Record results for output and graphing
    $testResults += $result
}

# Output the results to CSV
$testResults | Export-Csv "base64-results.csv" -Force
rhubarb-geek-nz commented 1 month ago

but the stack trace showed the AMSI call so I think most of the time is spent there rather than in managed-code string formatting.

I don't think it is so much the matter of time but heap usage, the StringBuilder will have another copy of the base64 data, then the construction of content will do similar.

I suggest you can do it without formatting or StringBuilder and only allocate the memory for the final string once.

            var argsBuilder = new string[5+args.Length*2];
            int i = 0;
            argsBuilder[i++] = "<";
            argsBuilder[i++] = targetName;
            argsBuilder[i++] = ">.";
            argsBuilder[i++] = name;

            if (args.Length > 0)
            {
                foreach (var arg in args)
                {
                    argsBuilder[i++] = i==5 ? "(<" : ">, <";
                    argsBuilder[i++] = ArgumentToString(arg);
                }

                argsBuilder[i++] = ">)";
            }
            else
            {
                argsBuilder[i++] = "()";
            }

            string content = String.Join(null, argsBuilder);

Apart from the list of strings themselves in argsBuilder the final string is allocated only in String.Join() which determines the total length first.

rhubarb-geek-nz commented 1 month ago

Sure, AMSI is part of Windows API and doesn't exist in Linux.

Not quite. The logging is still done on Linux, whether AMSI exists or not. This was run on 7.4.2 on Linux

#!/usr/bin/env pwsh
$env:__PSDumpAMSILogContent='1'
$bytes = new-object byte[] -ArgumentList @(,200554320)
$random = new-object Random
$random.NextBytes($bytes)

Gives

=== Amsi notification report content ===
<System.Random>.NextBytes(<System.Byte[]>)
=== Amsi notification report success: False ===
rhubarb-geek-nz commented 1 month ago

My workaround is to use two cmdlets to do the Base64 conversions hence bypass the PSAMSIInvocationLogging nonsense.

Original timings

 218.4235 ms
 64152.6876 ms

now

 222.2945 ms
 301.4766 ms

Code

#!/usr/bin/env pwsh

$env:__PSDumpAMSILogContent='1'

trap
{
    throw $PSItem
}

$ErrorActionPreference = 'Stop'

$code = @"
using System;
using System.Management.Automation;

    [Cmdlet("ConvertFrom", "Base64String")]
    public class ConvertFromBase64String : PSCmdlet
    {
        [Parameter(Mandatory=true,ValueFromPipeline=true)]
        public String InputString;

        protected override void ProcessRecord()
        {
            WriteObject(System.Convert.FromBase64String(InputString));
        }
    }

    [Cmdlet("ConvertTo", "Base64String")]
    public class ConvertToBase64String : PSCmdlet
    {
        [Parameter(Mandatory=true,ValueFromPipeline=true)]
        public byte[] InputObject;

        protected override void ProcessRecord()
        {
            WriteObject(System.Convert.ToBase64String(InputObject));
        }
    }
"@

Add-Type $code -PassThru | ForEach-Object { Import-Module $_.Assembly }

Get-Command -Noun 'Base64String'

$bytes = new-object byte[] -ArgumentList @(,200554320)

$bytes.Length

$random = new-object Random

$random.NextBytes($bytes)

$now = Get-Date

$base64 = @(,$bytes) | ConvertTo-Base64String

Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

$base64.Length

$bytes = $null

$now = Get-Date

$bytes = $base64 | ConvertFrom-Base64String

Write-Host " $(((Get-Date) - $now).TotalMilliseconds) ms"

$bytes.Length
chopinrlz commented 1 month ago

My workaround is to use two cmdlets to do the Base64 conversions hence bypass the PSAMSIInvocationLogging nonsense.

Thank you, this is very helpful. I was fiddling with using C# and Add-Type, but I noticed that making the call directly in C# via a static function still invokes the AMSI context. I will incorporate this into my PowerPass module to avoid the performance issue and excessive memory usage.

mklement0 commented 1 month ago

@chopinrlz, it isn't calls from C# that trigger an AMSI scan, it is .NET method calls from PowerShell code.

@rhubarb-geek-nz's workaround based on using C# code to implement cmdlets avoids this by letting PowerShell itself mediate the method calls.


Two asides:

rhubarb-geek-nz commented 1 month ago
  • $env:__PSDumpAMSILogContent='1' isn't effective in-session; the env. var. must be set before calling PowerShell.

Not actually true, it merely needs to be set before the first call to get DumpLogAMSIContent, the environment variable is accessed by a lazy load. Eg #21492

        private static readonly Lazy<bool> DumpLogAMSIContent = new Lazy<bool>(
            () => {
                object result = Environment.GetEnvironmentVariable("__PSDumpAMSILogContent");
                if (result != null && LanguagePrimitives.TryConvertTo(result, out int value))
                {
                    return value == 1;
                }
                return false;
            }
        );
  • $bytes = new-object byte[] -ArgumentList @(,200554320) can be simplified to $bytes = [byte[]]::new(200554320), which also avoids an (invisible) [psobject] wrapper that would cause an (unrelated) problem ....

Of course, how could we not have invisible, non-obvious problems in the simplest of code.

mklement0 commented 1 month ago

@rhubarb-geek-nz:

Not actually true, because for predictable diagnostic output you indeed do need the set the environment variable first, as evidenced by the following:

$null, 1 | % {
  Write-Host ---
  $env:__PSDumpAMSILogContent = $_
  pwsh -noprofile { [byte[]]::new(0) } 
}
$env:__PSDumpAMSILogContent = $null

Of course, how could we not have invisible, non-obvious problems in the simplest of code.

I assume this is pure sarcasm (which I do not endorse, but I empathize with the frustration I presume to underlie it); if there's an actual argument in there (beyond what #21496 expresses), please tell us.

rhubarb-geek-nz commented 1 month ago

Not actually true, because for predictable diagnostic output you indeed do need the set the environment variable first, as evidenced by the following:

The script was predictable because it had the "#!/usr/bin/env pwsh" at the start, the executable it bit set, and was designed to run directly from bash. It sets the environment variable after powershell has started but before the first reflection invocation.

if there's an actual argument in there, please tell us.

The frustration is because everytime you think you have found the solution with PowerShell, there is always another reason, case, exception or scenario where it breaks. As a user you don't have the tools to see all these problems because the very objects themselves play stupid games trying to pretend to be something they are not, or changing from what you thought it should have been. I can only assume I am not the target audience for this tool despite it supposedly being for system administrators, developers and IT professionals.

mklement0 commented 1 month ago

The script was predictable

No, it isn't predictable. My previous example stands. If the calling process doesn't have environment variable __PSDumpAMSILogContent already set before invocation, a call to [byte[]]::new(0) will not dump the diagnostic ASMI information, but it will do in the former case.

there is always another reason, case, exception or scenario where it breaks.

Again I empathize. But simply venting your frustration isn't the way forward.

In the case at hand I've (indirectly) pointed to the (ultimate) root cause of the underlying problem - #5579. I suggest channeling your frustration into constructive feedback - while being cognizant that such feedback may or may no be heard.

rhubarb-geek-nz commented 1 month ago

The script was predictable

No, it isn't predictable. My previous example stands. If the calling process doesn't have environment variable __PSDumpAMSILogContent already set before invocation, a call to [byte[]]::new(0) will not dump the diagnostic ASMI information, but it will do in the former case.

Yes, you are absolutely right. I wasn't predictable because you might have been using PowerShell as your default shell to launch scripts. Whereas all other UNIX shell scripts start a new script with executable bit set in a new process, we are talking about PowerShell here. Sigh.

Perhaps a recommendation for running test scripts is "In a new process....", not "In whatever process with whatever indeterminate state you happen to have....."

mklement0 commented 1 month ago

I wasn't predictable because you might have been using PowerShell as your default shell

I was, but that is irrelevant: the only thing that matters is whether the calling process had a __PSDumpAMSILogContent variable defined or not, so this equally applies to POSIX-compatible shells. See below.

mklement0 commented 1 month ago

P.S.: @rhubarb-geek-nz:

rhubarb-geek-nz commented 1 month ago

the only thing that matters is whether the calling process had a __PSDumpAMSILogContent variable defined or not, so this equally applies to POSIX-compatible shells.

I beg to offer a different opinion...

$ ls -ld new.ps1
-rwxr-xr-x 1 me users 139 Apr 18 00:57 new.ps1
$ cat new.ps1
#!/usr/bin/env pwsh
$env:__PSDumpAMSILogContent='1'
$bytes1024 = new-object byte[] -ArgumentList @(,1024)
$bytes2048 = [byte[]]::new(2048)

Scenario A - The environment variable is not set in the calling process

$ echo $__PSDumpAMSILogContent

$ ./new.ps1

=== Amsi notification report content ===
<System.Byte[]>.new(<2048>)
=== Amsi notification report success: False ===

Scenario B - it is set to 0 in the calling process

$ __PSDumpAMSILogContent=0
$ echo $__PSDumpAMSILogContent
0
$ ./new.ps1

=== Amsi notification report content ===
<System.Byte[]>.new(<2048>)
=== Amsi notification report success: False ===
mklement0 commented 1 month ago

You're right - via the CLI (as implicitly used via a shebang-based executable shell script), the in-process setting is honored, if:

A simpler demonstration: Start a pristine POSIX-compatible shell and run the following:

export -n __PSDumpAMSILogContent # ensure that the env. var. isn't defined.

# AMSI log output via env. var. defined BEFORE 
__PSDumpAMSILogContent=1 pwsh -noprofile -c '$null = [byte[]]::new(2048)'

# !! Produces AMSI output too, because the environment variable - despite being set in-session - is
# !! set *before the first method call*.
pwsh -noprofile -c '$env:__PSDumpAMSILogContent = 1; $null = [byte[]]::new(2048)'

Note:

rhubarb-geek-nz commented 4 weeks ago

Rather than making a cmdlet for every .NET method you wish to call, you can simply put reflection in a single cmdlet.

$bytes = [byte[]]@(1,2,3)
$base64 = [string](Invoke-Reflection -Method ToBase64String -Type ([System.Convert]) -ArgumentList @(,$bytes))
Invoke-Reflection -Method FromBase64String -Type ([System.Convert]) -ArgumentList @(,$base64) | Format-Hex
rhubarb-geek-nz commented 4 weeks ago

and using a filename extension with an executable shell script is generally ill-advised.

Really? That is one I have not heard of.... eg

$ ls -ld *.sh
-rwxr-xr-x 1 github users  772 May 18  2023 debug.sh
-rw-r--r-- 1 github users  107 May 18  2023 download.sh
-rwxr-xr-x 1 github users 2094 May 18  2023 generate-icns.sh
-rwxr-xr-x 1 github users 7418 May 18  2023 install-powershell.sh
-rwxr-xr-x 1 github users 7307 May 18  2023 installpsh-amazonlinux.sh
-rwxr-xr-x 1 github users 9229 May 18  2023 installpsh-debian.sh
-rwxr-xr-x 1 github users 7791 May 18  2023 installpsh-gentoo.sh
-rw-r--r-- 1 github users 7533 May 18  2023 installpsh-mariner.sh
-rwxr-xr-x 1 github users 6483 May 18  2023 installpsh-osx.sh
-rwxr-xr-x 1 github users 6425 May 18  2023 installpsh-redhat.sh
-rwxr-xr-x 1 github users 9081 May 18  2023 installpsh-suse.sh

If you mean executable PowerShell scripts without the ps1 extension, we know how that ends up.

rhubarb-geek-nz commented 4 weeks ago

Another alternative is to do the reflection directly in PowerShell itself

ToBase64String is

$method = ([System.Convert]).GetMethod('ToBase64String',[type[]]@(,([byte[]])))

$base64 = [string]($method.Invoke($null,@(,$bytes)))

FromBase64String is

$method = ([System.Convert]).GetMethod('FromBase64String',[type[]]@(,([string])))

$bytes = $method.Invoke($null,@(,$base64))

Then the AMSI logging just looks like

=== Amsi notification report content ===
<System.Random>.NextBytes(<System.Byte[]>)
=== Amsi notification report success: False ===

=== Amsi notification report content ===
<System.RuntimeType>.GetMethod(<ToBase64String>, <System.Type[]>)
=== Amsi notification report success: False ===

=== Amsi notification report content ===
<System.Reflection.RuntimeMethodInfo>.Invoke(<null>, <System.Object[]>)
=== Amsi notification report success: False ===

=== Amsi notification report content ===
<System.RuntimeType>.GetMethod(<FromBase64String>, <System.Type[]>)
=== Amsi notification report success: False ===

=== Amsi notification report content ===
<System.Reflection.RuntimeMethodInfo>.Invoke(<null>, <System.Object[]>)
=== Amsi notification report success: False ===

Where the arguments are not dumped because all it prints is System.Object[]

mklement0 commented 4 weeks ago

Really? That is one I have not heard of...

Unfortunately, many ill-advised practices are common. An executable shell script (using a shebang line) is an executable like any other, and there is no benefit to signaling to a caller that a given executable happens to be a shell script, which is (a) an implementation detail and (b) may lead users to believe that sh <script>.sh should be used for invocation, which can fail if the script uses Bashisms, for instance.

With PowerShell, specifically, things get tricky (leaving the bug you mention aside), because, unlike analogous shell scripts for POSIX-compatible shells, an executable, shebang line-based .ps1 file is still executed in-process, with the potential to alter the session state. An executable, shebang line-based .ps1 file must therefore be designed with this in mind.

One without this extension consistently runs in a child process - albeit more slowly and at the expense of not having rich type support in the in- and output and the inability to pass array arguments and arguments that have no string-literal representations - but a PowerShell script that is designed to (also) run as a standalone executable should not rely on these features anyway.

I presume it is the latter limitations that explain why - at least in my perception - shebang line-based PowerShell scripts haven't really caught on and why bugs such as #21402 are still not fixed.