Azure / azure-powershell

Microsoft Azure PowerShell
Other
4.26k stars 3.86k forks source link

Az Storage (and possibly Synapse) cmdlets that use TaskOutputStream exhibit a slow string memory leak #23787

Open milope opened 10 months ago

milope commented 10 months ago

Description

When running Az Storage modules that run as multi-threaded, they exhibit a memory-leak. NOTE: I did not use $DebugPreference="SilentlyContinue" deliberately as this is how the memory-leak is exhibited. What appears to happen is that The Storage Cmdlets appear to enable Diagnostic Tracing. It appears that diagnostic tracing is enabled on the PowerShell process and a diagnostic trace injection happens. Due to this configuration, all OutputStream.WriteDebug calls get appended endlessly to the diagnostic tracing's Messages properties. This includes all request and response content from the .NET SDK. The more the repro is run, the larger the messages and other TaskOutputStream queues get.

See below where a quick 500 iterations would pile up 54270 strings in the Messages property alone. Continue running iterations and monitor that property grow larger and larger. Please note: The -Debug flag is not set and the $DebugPreference is set to SilentlyContinue

Workaround

A quick workaround I've been forced to do is to call [Diagnostics.Trace]::Listeners.Clear() right before and after any long-running background Storage cmdlet like the following:

[Diagnostics.Trace]::Listeners.Clear()
Get-AzStorageBlob -Container $containerName -Context $storageContext -IncludeVersion -IncludeDeleted | Select-Object -Property @{label="UndeleteResult";expression={ $_.BlobClient.Undelete(); } }
[Diagnostics.Trace]::Listeners.Clear()

Debugging

When opening a 1GB memory dump running code similar to above, I found that Azure.Core.Diagnostics.AzureEventSourceListener had an inclusive size of well over 400 MB:

image

First Path to Root below: image

Recommendations

I would recommend avoid storing Debug messages to these ConcurrentQueue<String> objects if Debug output is not enabled. Also once the Debug output has been streamed to console or destination, maybe the Queues can be cleared.

Issue script & Debug output

$resourceGroupName = "resource-group-name"
$storageAccountName = "storageaccountname"
$containerName = "storage-container-name"
$location = "location"
$numberOfBlobs = 3000 #The more, the better
$strLength = $numberOfBlobs.ToString().Length

$rg = Get-AzResourceGroup -Name $resourceGroupName -ErrorAction SilentlyContinue
if($null -eq $rg) {
    New-AzResourceGroup -Name $resourceGroupName -Location $location | Out-Null
}

New-AzStorageAccount -ResourceGroupName $resourceGroupName -Name $storageAccountName -SkuName Standard_LRS `
    -Location $location -Kind Storage -EnableHttpsTrafficOnly $true
$context = (Get-AzStorageAccount -ResourceGroupName $resourceGroupName -Name $storageAccountName).Context
$container = New-AzStorageContainer -Name $containerName -Context $context
"This is a test file" | Out-File "delete-me.txt"
try {
    $blobPath = (1).ToString().PadLeft($strLength,'0') + ".txt"
    Set-AzStorageBlobContent -Context $context -Container $containerName -Blob $blobPath -BlobType Block -File delete-me.txt | Out-Null
    $blob = Get-AzStorageBlob -Container $containerName -Blob $blobPath
    $copyExpression = { ($blob | Start-AzStorageBlobCopy -DestContainer test -DestBlob ($_.ToString().PadLeft($strLength, '0') + ".txt") -Context $blob.Context).Name }
    (2 .. $numberOfBlobs) | Select-Object -Property @{label="index";expression={$_}},@{label="NewBlobName";expression=$copyExpression}
}
finally {
    Remove-Item -Path "delete-me.txt"
}
Write-Host "Diagnostic Messages Count: $([Diagnostics.Trace]::Listeners.Messages.Count)"
Write-Host "Last Message: $([Diagnostics.Trace]::Listeners.Messages[-1])"

Output
--------

Diagnostic Messages Count: 54270
Last Message: Response [70f67b4d-3d0c-4742-9c26-61cbb36a8c9e] 200 OK (00.0s)
Accept-Ranges:bytes
ETag:"0x8DC00AE899B52CC"
Server:Windows-Azure-Blob/1.0,Microsoft-HTTPAPI/2.0
x-ms-request-id:5fe94690-b01e-0000-1a97-326a9c000000
x-ms-client-request-id:70f67b4d-3d0c-4742-9c26-61cbb36a8c9e
x-ms-version:2022-11-02
x-ms-creation-time:Tue, 19 Dec 2023 16:21:21 GMT
x-ms-lease-status:unlocked
x-ms-lease-state:available
x-ms-blob-type:BlockBlob
x-ms-copy-id:d8c9185d-ca33-45bf-8ae0-1248370ba2e2
x-ms-copy-source:REDACTED
x-ms-copy-status:success
x-ms-copy-progress:21/21
x-ms-copy-completion-time:Tue, 19 Dec 2023 16:21:21 GMT
x-ms-server-encrypted:true
Date:Tue, 19 Dec 2023 16:21:21 GMT
Content-Length:21
Content-Type:application/octet-stream
Content-MD5:pNg7lQmZ9Wn3/ujZb7MZAA==
Last-Modified:Tue, 19 Dec 2023 16:21:21 GMT

Environment data

PS > $PSVersionTable

Name                           Value
----                           -----
PSVersion                      7.4.0
PSEdition                      Core
GitCommitId                    7.4.0
OS                             Microsoft Windows 10.0.22000
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

Module versions

PS> Get-Module Az*

ModuleType Version    PreRelease Name                                ExportedCommands
---------- -------    ---------- ----                                ----------------
Script     2.12.5                Az.Accounts                         {Add-AzEnvironment, Clear-AzConfig, Clear-AzContext, Clear-AzDefault…}
Script     6.7.0                 Az.Resources                        {Export-AzResourceGroup, Export-AzTemplateSpec, Get-AzDenyAssignment, Get-AzDeployment…}
Script     5.9.0                 Az.Storage                          {Add-AzRmStorageContainerLegalHold, Add-AzStorageAccountManagementPolicyAction, Add-AzStorageAccountNetworkRule, Close-AzStorageFileHandle…

Error output

N/A.
milope commented 10 months ago

Adding a bit more. This is my memory usage a bit after:

Get-AzStorageBlob -Container test -Context $context -IncludeVersion -IncludeDeleted | Remove-AzStorageBlob -Force

image

PS> [Diagnostics.Trace]::Listeners.Messages.Count 31418865

isra-fel commented 10 months ago

Thanks for reporting and the detailed investigation. I'm locating the root cause. Just to double check:

Recommendations I would recommend avoid storing Debug messages to these ConcurrentQueue objects if Debug output is not enabled. > Also once the Debug output has been streamed to console or destination, maybe the Queues can be cleared.

What did you refer to by ConcurrentQueue<String> objects

milope commented 10 months ago

https://github.com/Azure/azure-powershell/blob/b1e84cea80210e5be6c548aafc12e8dcfa0f9881/src/Storage/Storage/Common/TaskOutputStream.cs#L91

Likely this one in particular. I managed to dump a 4GB + dump and the memory analysis on VS pointed at that object consuming a lot of memory. I dumped some of the objects and it’s full of request and responses from the Storage SDK as if a trace injection started picking up all Azure-Core events and storing it in those queues.