PowerShell / ThreadJob

MIT License
25 stars 11 forks source link

Race condition exception when accessing $Host.UI.psobject.Properties inside a runspace #31

Open kborowinski opened 1 year ago

kborowinski commented 1 year ago

Prerequisites

Steps to reproduce (thanks to @SeeminglyScience)

Environment:

  1. PowerShell 7.3.6 or higher on up-to-date Windows 10 22H2 x64
  2. ThreadJob 2.0.3 and Microsoft.PowerShell.ThreadJob 2.1.0

Sporadic race condition exception when accessing $Host.UI.psobject.Properties inside a runspace started with Start-ThreadJob:

  1. Start new session: pwsh -nop

  2. Paste following:

    
    $event = [System.Threading.ManualResetEvent]::new($false)
    $jobs = 1..500 | % {
    Start-ThreadJob {
        $event = $using:event
        $null = $event.WaitOne()
        $supportsVT = $Host.UI.psobject.Properties['SupportsVirtualTerminal'].Value
    } -StreamingHost $Host
    }

Start-Sleep -Milliseconds 500 $null = $event.Set()

$jobs | Receive-Job -AutoRemoveJob -Wait

3. If you couldn't repro first time, start new PowerShell session, do not try in the same session.

### Steps to reproduce with Pester 5.5.0:

**Environment:**
1. PowerShell 7.3.6 or higher on up-to-date Windows 10 22H2 x64
2. Pester 5.5.0
3. ThreadJob 2.0.3 and Microsoft.PowerShell.ThreadJob 2.1.0

Sporadic race condition exception when accessing `$Host.UI.psobject.Properties` inside a runspace started with `Start-ThreadJob`:
1. Start new session:
`pwsh -nop`

2. Paste following:
```powershell
1..500 | ForEach-Object {
    Start-ThreadJob {
        1 | Should -BeExactly 1
        $supportsVT = $Host.UI.psobject.Properties['SupportsVirtualTerminal'].Value
    } -StreamingHost $Host
} | Wait-Job | Receive-Job | Remove-Job -Force
  1. If you couldn't repro first time, start new PowerShell session, do not try in the same session.

More details here.

Expected behavior

There should be no exception.

Actual behavior

InvalidOperation: Cannot index into a null array.
InvalidOperation: Cannot index into a null array.
InvalidOperation: Cannot index into a null array.
InvalidOperation: Cannot index into a null array.

Error details

PS C:\> Get-Error

Exception             :
    Type        : System.Management.Automation.RuntimeException
    ErrorRecord :
        Exception             :
            Type    : System.Management.Automation.ParentContainsErrorRecordException
            Message : Cannot index into a null array.
            HResult : -2146233087
        CategoryInfo          : InvalidOperation: (:) [], ParentContainsErrorRecordException
        FullyQualifiedErrorId : NullArray
        InvocationInfo        :
            ScriptLineNumber : 1
            OffsetInLine     : 26
            HistoryId        : -1
            Line             : 1 | Should -BeExactly 1; $supportsVT = $Host.UI.psobject.Properties['SupportsVirtualTerminal'].Value
            PositionMessage  : At line:1 char:26
                               + … eExactly 1; $supportsVT = $Host.UI.psobject.Properties['SupportsVirtu …
                               +               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            CommandOrigin    : Internal
        ScriptStackTrace      : at <ScriptBlock>, <No file>: line 1
    TargetSite  : System.Object CallSite.Target(System.Runtime.CompilerServices.Closure, System.Runtime.CompilerServices.CallSite, System.Object, System.String)
    Message     : Cannot index into a null array.
    Data        : System.Collections.ListDictionaryInternal
    Source      : Anonymously Hosted DynamicMethods Assembly
    HResult     : -2146233087
    StackTrace  :
   at CallSite.Target(Closure, CallSite, Object, String)
   at System.Dynamic.UpdateDelegates.UpdateAndExecute2[T0,T1,TRet](CallSite site, T0 arg0, T1 arg1)
   at System.Management.Automation.Interpreter.DynamicInstruction`3.Run(InterpretedFrame frame)
   at System.Management.Automation.Interpreter.EnterTryCatchFinallyInstruction.Run(InterpretedFrame frame)
CategoryInfo          : InvalidOperation: (:) [], RuntimeException
FullyQualifiedErrorId : NullArray
InvocationInfo        :
    ScriptLineNumber : 1
    OffsetInLine     : 26
    HistoryId        : -1
    Line             : 1 | Should -BeExactly 1; $supportsVT = $Host.UI.psobject.Properties['SupportsVirtualTerminal'].Value
    PositionMessage  : At line:1 char:26
                       + … eExactly 1; $supportsVT = $Host.UI.psobject.Properties['SupportsVirtu …
                       +               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    CommandOrigin    : Internal
ScriptStackTrace      : at <ScriptBlock>, <No file>: line 1

Environment data

Name                           Value
----                           -----
PSVersion                      7.3.6
PSEdition                      Core
GitCommitId                    7.3.6
OS                             Microsoft Windows 10.0.19045
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

Version

ThreadJob 2.0.3 and Microsoft.PowerShell.ThreadJob 2.1.0

Visuals

race

kilasuit commented 1 year ago

I cannot repro this in 5.1 or 7.3.6 when running in or out of WT, either when using Microsoft.PowerShell.ThreadJob v2.1.0 or ThreadJob 2.0.3 on Microsoft Windows 10.0.25905 - aka Win 11 23H2 via the Insiders Dev ring

kborowinski commented 1 year ago

Where can I get the 2.1.0 version, so I could test it on Windows 10 and PowerShell 5.1 and 7.3.6? On PowerShell Gallery there is only 2.0.3. The same on GitHub repo.

kilasuit commented 1 year ago

Install-Module Microsoft.PowerShell.ThreadJob installs the renamed and updated ThreadJob module & ThreadJob module is technically obsolete going forward as when the PowerShell team took ownership of the module it was renamed, although I think this may not have been well communicated at the time

kborowinski commented 1 year ago

Thanks, sorry, I've missed the info about the new module name. Will test it tomorrow. I've just finished a 9 hour car drive for the holidays so I wasn't thinking clearly.

kborowinski commented 1 year ago

@kilasuit Did test it with Microsoft.PowerShell.ThreadJob module v2.1.0, still the same issue: race

To be sure, I have tested it on 3 different computers:

  1. Windows Server 2019,
  2. Windows 10 Home,
  3. Windows 10 Pro

Also always try it on the new PowerShell session, do not try it in the current if there was no repro. If still no success try this example:

$cpuCount = [Environment]::ProcessorCount

$splatStartThreadJob = @{
    ThrottleLimit = $cpuCount
    StreamingHost = $Host
    ErrorAction = 'Stop'
    InitializationScript = {Import-Module Pester -Force -ErrorAction Stop}
}

$threadJobs = 1..($cpuCount * 2) | ForEach-Object {
    Microsoft.PowerShell.ThreadJob\Start-ThreadJob -Name $_ -ScriptBlock {
        $pc = New-PesterConfiguration
        $pc.Run.Container = New-PesterContainer -ScriptBlock {Describe 'd' { It 'i' { 1 | Should -Be 1 }}}
        $pc.Output.Verbosity = 'None'
        $pc.Run.PassThru = $true
        $pc.Run.SkipRemainingOnFailure ='Block'
        [PSCustomObject]@{
            Test    = $Using:_
            Result  = Invoke-Pester -Configuration $pc
        }
    } @splatStartThreadJob
}

Wait-Job -Job $threadJobs | Receive-Job

race2

kilasuit commented 1 year ago

Adding this snippet from the Discord discussion yesterday from @seeminglyscience as another way to repro this

$event = [System.Threading.ManualResetEvent]::new($false)
$jobs = 1..500 | % {
    Start-ThreadJob {
        $event = $using:event
        $null = $event.WaitOne()
        $supportsVT = $Host.UI.psobject.Properties['SupportsVirtualTerminal'].Value
    } -StreamingHost $Host
}

Start-Sleep -Milliseconds 500
$null = $event.Set()

$jobs | Receive-Job -AutoRemoveJob -Wait
SteveL-MSFT commented 1 year ago

Based on the callstack, it appears that ListDictionary is being used somewhere which is not threadsafe. In this case, this doesn't appear to be an issue with ThreadJob itself, but something the thread is accessing is not threadsafe and results in the race condition.

If the actual use case is to see within a thread if SupportsVirtualTerminal is true/false, then it would be best to capture that outside of the thread job and pass it in.

kborowinski commented 1 year ago

The issue was detected first when the Pester was executed with ThreadJob in order to speed up infrastructure tests. Pester is accessing the SupportsVirtualTerminal inside the Pester.psm1. I guess that Pester authors weren't aware that this is not thread safe and it would require change in Pester code.

I got lots of help from @SeeminglyScience, when debugging this issue. Maybe you could contact him for some more in-depth analysis.

SeeminglyScience commented 1 year ago

I would agree that I don't think it's a ThreadJob problem. My guess is that it's a race condition in how PowerShell is creating the internal PSMemberInfoCollection<> and caching it in the member resurrection table.

At a glance it looks like there's a lock around the creation of the underlying collection, but the collection itself is populated lazily with no lock.

I'd also second the suggestion of working around it by accessing the object before hand in the primary thread if possible. I'm hesitant to suggest more locks there as it may have a noticeable impact on performance for an issue that doesn't get hit often.