Open mikem8361 opened 3 months ago
This test timeout reproduces fairly consistently (maybe only a flakey success once in a while).
Since this test was noted to have started failing while https://github.com/dotnet/diagnostics/pull/4743 was bumping the runtime version, I first started to use a locally built runtime based on the commit hash 8fac5af2b11dc98fa0504f6fd06df790164ec958
that the PR bumped from following the instructions at https://github.com/dotnet/diagnostics/blob/main/documentation/privatebuildtesting.md. As the timeout still reproduced, I additionally built the libs subset and copied the entire runtime\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\*
as well, but the timeout still reproduced.
Moreover, I rebuilt the diagnostics repo after reverting the runtime bump pr https://github.com/dotnet/diagnostics/pull/4743, and still the timeout reproduced.
As such, its unclear how far back the timeout began, or when the culprit change really occurred to start causing the test to timeout.
Debugging the test timeout itself, it seems to consistently hit an exception at
https://github.com/dotnet/diagnostics/blob/d6d465c2079f8b6deca0e7a8b63cc2ada5ffd259/src/Microsoft.Diagnostics.Monitoring.EventPipe/DiagnosticsEventPipeProcessor.cs#L78
where the token is cancelled and gets caught at https://github.com/dotnet/diagnostics/blob/d6d465c2079f8b6deca0e7a8b63cc2ada5ffd259/src/Microsoft.Diagnostics.Monitoring.EventPipe/DiagnosticsEventPipeProcessor.cs#L87. It's not clear to me exactly why the token is being cancelled, but seeing as how the MicrosoftDiagnosticsTracingTraceEventVersion version
hasn't changed in a year, it doesn't seem related to that package.
@noahfalk @davmason Do y'all have any suspicions as to how the token might be getting cancelled? I believe these were the async callstacks from DiagnosticsEventPipeProcessor.Process().
I am not super familiar with this part of the code, I don't have an idea for what is causing the token to be cancelled
Re-enable this test after it is fixed.
Failure: