Open gimlichael opened 1 month ago
@gimlichael would it be possible to create a PR that shows the problem and produces a binary log in your repository please?
@nohwnd I will try to put aside some time to fulfill your request; hopefully before the end of this week.
@nohwnd i have created this branch for you: https://github.com/gimlichael/Cuemon/tree/v9.0.0/dotnet-sdk-issue-43432
I did notice (looking in the source associated with dotnet test
and the timeframe the bug occurred) that you guys are using a lot of .GetAwaiter().GetResult() .. I think this might be an issue in a CI environment with low-spec CPU.
Anyway, feel free to clone it and run it .. still Ubuntu only that fails. Windows run fine.
I did enable the -bl build switch, but I don't see any binary files added to GitHub artifacts. If I go into the cog-wheel, I do see an option to download the log archive, but after 20 MB or so, an error occurs from GitHub.
https://github.com/gimlichael/Cuemon/actions/runs/11000415914/job/30542914518
@nohwnd did you have a chance to look into the issue?
I just tried to do it, but checking out the branch and running dotnet build
fails with many errors like this:
S:\c\Cuemon\test\Cuemon.AspNetCore.Mvc.FunctionalTests\Filters\Diagnostics\FaultDescriptorFilterTest.cs(1342,21): error CS0012: The type 'Disposable' is defined in an assembly that is not referenced. You must add a reference to assembly 'Cuemon.Core, Version=9.0.0.0, Culture=neutral, PublicKeyToken=9f6823cab47d945f'.
Build failed with 5525 error(s) and 4 warning(s) in 13,8s
I don't see contributing.md or other description of build steps, and your CI seems to do the same. What do I need to do to build please?
@nohwnd thank you for getting back.
I just verified your discovery and I realized, that the branch I used had to many experimental changes for the forthcoming v9.0.0 release, why I replaced the branch in question with one a little earlier in the development process.
I have verified that:
git clone https://github.com/gimlichael/Cuemon.git
git checkout v9.0.0/dotnet-sdk-issue-43432
(removed the -b switch due to muscle-memory in my fingers)dotnet build
Now works as expected.
I also verified, that the bug is still present.
Sorry for the inconvenience; and thank you for hinting I should have a look at my community health pages while I am at it 🙂
I've updated to the latest, and I don't know what I am doing wrong but I still see the same errors. I also tried to delete the repo and clone again, but that did not help either.
Are all the changes you wanted to push pushed?
S:\c\Cuemon [v9.0.0/dotnet-sdk-issue-43432]> git log --oneline
bd3caa34 (HEAD -> v9.0.0/dotnet-sdk-issue-43432, origin/main, origin/HEAD, main) V9.0.0/housekeeping for extensions globalization (#90)
d5f284b9 (tag: v9.0.0-preview.9) V9.0.0/move replace-line-endings to xunit (#89)
e1941c3b (tag: v9.0.0-preview.8) V9.0.0/finalize newtonsoft json migration (#88)
aa3bfa62 V9.0.0/finalize swashbuckle aspnetcore migration (#87)
5e308b4f V9.0.0/finalize asp versioning migration (#86)
f2f6db4a Bump Backport.System.Threading.Lock to 2.0.3 (#85)
b0cac8b9 (tag: v9.0.0-preview.7) V9.0.0/finalize yamldotnet migration (#84)
73f14cd6 (tag: v9.0.0-preview.6) V9.0.0/finalize xunit migration (#83)
87ca1047 V9.0.0/net9rc1 housekeeping (#82)
cf9609de (tag: v9.0.0-preview.5) V9.0.0/xunit testoutputhelperaccessor (#78)
19af986a V9.0.0/lock refactoring (#77)
51c275e4 (tag: v9.0.0-preview.4) V9.0.0/hotfix, housekeeping and dependency bump (#76)
450b22ef (tag: v9.0.0-preview.3) V9.0.0/System.Threading.Lock (#73)
a539fa16 V9.0.0/housekeeping and CI fix (#75)
11c206ac (tag: v9.0.0-preview.2) V9.0.0/problem details (#74)
e251523d (tag: v9.0.0-preview.1) V9.0.0/major (#72)
ad2cd5c1 :pencil2: updated changelog
I see you target both net9 and net8, I am on net9-rc2, but I also tried downgrading to net9-preview7 and it has the same problem. The errors also don't seem to be upgraded from warnings.
I have now double- and tripled checked the branch.
Expectations to meet:
If these TFMs are in place, build will succeed.
I was able to locate a typo in my post from before - DO REMOVE -b (as this will checkout a new branch from the main) ..
Here are the adjusted steps:
git clone https://github.com/gimlichael/Cuemon.git
git checkout v9.0.0/dotnet-sdk-issue-43432
dotnet build
Sorry for the typo .. 😪
@nohwnd do let me know, despites the set-backs, if you were able to proceed with bughunting 🙂
I was able to make it run, but I don't see any hangs yet.
Keep in mind, that I cannot provoke the error in the confines of my local development. It is only in GitHub Actions using an Ubuntu runner.
Consider running this flow; it is tweaked to only include the two failing projects.
Hello,
I have the same problem but on different CI. We are using JenkinsX on on-premise K8s (1.29) cluster. We have our own tailored build images based on Ubuntu 22.04. When I update .NET SDK to 9.0 RC1 pipeline stuck on test step. As in case of @gimlichael I run dotnet test with --no-build parameter.
I tried SDK versions 9.0.100-preview.7.24407.12 - OK 9.0.100-rc.1.24452.12 - NOK 9.0.100-rc.2.24474.11 - NOK
First, I suspected breaking change with terminal logger but adding MSBUILDTERMINALLOGGER=off to image does not help.
P.S. I cannot send logs with our internal project but if I will be able to create minimal repro I will publish it.
@nohwnd - any update on this issue? My workaround works, but I still think you guys should fix the issue introduced sometime between preview7 and rc1. For you convenience, maybe this can help you on track: https://github.com/dotnet/sdk/commits/main/src/Cli/dotnet/commands/dotnet-test?since=2024-08-13&until=2024-09-10
I did a few experiments and it looks like that the problem is related with logging.
First I tried blame-hang if the problem is not in tests but without luck
''' dotnet test --configuration Release --verbosity n --no-build --no-restore --blame-hang --blame-hang-dump-type full --blame-hang-timeout 1m '''
then I tried
dotnet test --configuration Release --verbosity q --no-build --no-restore
And build on CI agent successfully passed.
I did a few experiments and it looks like that the problem is related with logging.
First I tried blame-hang if the problem is not in tests but without luck
''' dotnet test --configuration Release --verbosity n --no-build --no-restore --blame-hang --blame-hang-dump-type full --blame-hang-timeout 1m '''
then I tried
dotnet test --configuration Release --verbosity q --no-build --no-restore
And build on CI agent successfully passed.
Interesting - especially because prior to this bug, I always used a verbosity level of quiet. Only after the bug surfaced, I changed verbosity normal, detailed and diagnostic.
GitHub Actions DOES NOT like diagnostics loglevel .. haha .. you cannot even download logs afterwards due to the size.
That said - and as you have discovered - it might still be related to logging. Let's hope the SDK team find a solution 🤞🏼
@nohwnd - any updates? It fails with both RC1 and RC2. Last working build was Preview 7.
I am sorry but there are no updates. This is on my work list, but unfortunately not on the top.
I am sorry but there are no updates. This is on my work list, but unfortunately not on the top.
Fair - thank you for the honest response. Fingers crossed it will be solved with a patch after .NET 9 launch next month 🤞🏼
I am sorry but there are no updates. This is on my work list, but unfortunately not on the top.
Thank you for the update!
We've run into the same issue with RC2, also on Jenkins on an on-prem cluster.
Until fixed this will categorically prevent us from moving over to NET9 whatsoever once released - I'd imagine a good bunch of enterprise users might be in the same boat there, though hard to judge if it just affects some small fraction of CI build setups, or is a wider problem. Either way, thanks of looking into this!
.NET 9 RC2 - dotnet test running on a self-hosted Act runner, running in a Docker container using an Ubuntu image, seems to hang on one host but runs OK on another.
It hangs when I run my tests using this command (works fine in .NET 8):
dotnet test --no-build --verbosity normal
And crashes when I run it with this command
dotnet test --configuration Release --verbosity normal --no-build --no-restore
with the following log
Build started 11/04/2024 19:13:23.
1>Project "/workspace/Starlight/Starlight/Starlight.sln" on node 1 (VSTest target(s)).
1>ValidateSolutionConfiguration:
Building solution configuration "Release|Any CPU".
1>Project "/workspace/Starlight/Starlight/Starlight.sln" (1) is building "/workspace/Starlight/Starlight/tests/Starlight.Services.Tests/Starlight.Services.Tests.csproj" (6) on node 7 (VSTest target(s)).
6>_VSTestConsole:
MSB4181: The "VSTestTask" task returned false but did not log an error.
6>Done Building Project "/workspace/Starlight/Starlight/tests/Starlight.Services.Tests/Starlight.Services.Tests.csproj" (VSTest target(s)) -- FAILED.
1>Project "/workspace/Starlight/Starlight/Starlight.sln" (1) is building "/workspace/Starlight/Starlight/tests/Starlight.Updater.Tests/Starlight.Updater.Tests.csproj" (8) on node 2 (VSTest target(s)).
8>_VSTestConsole:
MSB4181: The "VSTestTask" task returned false but did not log an error.
8>Done Building Project "/workspace/Starlight/Starlight/tests/Starlight.Updater.Tests/Starlight.Updater.Tests.csproj" (VSTest target(s)) -- FAILED.
1>Project "/workspace/Starlight/Starlight/Starlight.sln" (1) is building "/workspace/Starlight/Starlight/tests/Starlight.Models.Tests/Starlight.Models.Tests.csproj" (9) on node 3 (VSTest target(s)).
9>_VSTestConsole:
MSB4181: The "VSTestTask" task returned false but did not log an error.
9>Done Building Project "/workspace/Starlight/Starlight/tests/Starlight.Models.Tests/Starlight.Models.Tests.csproj" (VSTest target(s)) -- FAILED.
1>Done Building Project "/workspace/Starlight/Starlight/Starlight.sln" (VSTest target(s)) -- FAILED.
Build FAILED.
0 Warning(s)
0 Error(s)
Time Elapsed 00:00:01.24
Changing the log level to quiet does not help.
Finally, if I just try to run dotnet test
, the test runner only discovers a single test out of many.
I cannot reproduce issue locally on my macOS machine.
Finally getting back to this. If someone has a local better repro, binary log or diagnostic logs that they can share, please let me know.
I was able to download the original logs from Cuemon (from the github action, they show that vstest task is running but that is about it).
2024-09-23T18:48:38.1132167Z 1:19>Done building target "ShowCallOfVSTestTaskWithParameter" in project "Cuemon.AspNetCore.Mvc.FunctionalTests.csproj".
2024-09-23T18:48:38.1132353Z Done executing task "CallTarget".
2024-09-23T18:48:38.1133033Z Using "VSTestTask" task from assembly "/usr/share/dotnet/sdk/9.0.100-rc.1.24452.12/Microsoft.TestPlatform.Build.dll".
2024-09-23T18:48:38.1133182Z Task "VSTestTask"
2024-09-23T18:48:38.1133375Z Task Parameter:VSTestBlameHang=True
2024-09-23T18:48:38.1133584Z Task Parameter:VSTestVerbosity=detailed
2024-09-23T18:48:38.1134227Z Task Parameter:VSTestTraceDataCollectorDirectoryPath=/home/runner/.nuget/packages/microsoft.codecoverage/17.11.1/build/netstandard2.0/
2024-09-23T18:48:38.1134573Z Task Parameter:VSTestResultsDirectory=/home/runner/work/_temp/TestResults
2024-09-23T18:48:38.1135189Z Task Parameter:VSTestTestAdapterPath=/home/runner/.nuget/packages/coverlet.collector/6.0.2/build/netstandard2.0/
2024-09-23T18:48:38.1135410Z Task Parameter:VSTestBlameHangTimeout=10m
2024-09-23T18:48:38.1135922Z Task Parameter:VSTestConsolePath=/usr/share/dotnet/sdk/9.0.100-rc.1.24452.12/vstest.console.dll
2024-09-23T18:48:38.1136191Z Task Parameter:VSTestFramework=.NETCoreApp,Version=v9.0
2024-09-23T18:48:38.1136448Z Task Parameter:VSTestArtifactsProcessingMode=collect
2024-09-23T18:48:38.1136627Z Task Parameter:VSTestLogger=trx
2024-09-23T18:48:38.1136935Z Task Parameter:VSTestCollect=XPlat Code Coverage;Format=opencover
2024-09-23T18:48:38.1137390Z Task Parameter:VSTestSessionCorrelationId=2566_e2458d0a-316a-4f7c-bc36-809795f7b744
2024-09-23T18:48:38.1138235Z Task Parameter:TestFileFullPath=/home/runner/work/Cuemon/Cuemon/test/Cuemon.AspNetCore.Mvc.FunctionalTests/bin/Debug/net9.0/Cuemon.AspNetCore.Mvc.FunctionalTests.dll
2024-09-23T18:48:38.1138458Z Task Parameter:VSTestBlameHangDumpType=mini
2024-09-23T19:00:29.2866221Z ##[error]The operation was canceled.
2024-09-23T19:00:29.2965128Z Post job cleanup.
2024-09-23T19:00:29.3022727Z Post job cleanup.
2024-09-23T19:00:29.3078135Z Post job cleanup.
2024-09-23T19:00:29.3133174Z Post job cleanup.
2024-09-23T19:00:29.4096216Z [command]/usr/bin/git version
2024-09-23T19:00:29.4130361Z git version 2.46.0
2024-09-23T19:00:29.4173260Z Temporarily overriding HOME='/home/runner/work/_temp/a32d568f-9d17-48cb-af35-577639717d18' before making global git config changes
2024-09-23T19:00:29.4175549Z Adding repository directory to the temporary git global config as a safe directory
2024-09-23T19:00:29.4179724Z [command]/usr/bin/git config --global --add safe.directory /home/runner/work/Cuemon/Cuemon
2024-09-23T19:00:29.4214951Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2024-09-23T19:00:29.4245963Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :"
2024-09-23T19:00:29.4907938Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2024-09-23T19:00:29.4909542Z http.https://github.com/.extraheader
2024-09-23T19:00:29.4913024Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader
2024-09-23T19:00:29.4917309Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :"
2024-09-23T19:00:29.5160414Z Cleaning up orphan processes
2024-09-23T19:00:29.5462333Z Terminate orphan process: pid (2566) (dotnet)
2024-09-23T19:00:29.5511293Z Terminate orphan process: pid (2596) (dotnet)
2024-09-23T19:00:29.5581880Z Terminate orphan process: pid (2597) (dotnet)
2024-09-23T19:00:29.5638717Z Terminate orphan process: pid (2598) (dotnet)
2024-09-23T19:00:29.5692699Z Terminate orphan process: pid (2885) (dotnet)
2024-09-23T19:00:29.5781556Z Terminate orphan process: pid (2886) (dotnet)
So pretty much, tests running, some work is being done, and then the parent kills it.
Will try to run on GH, to see if I can get the diag logs;
This is probably the same as https://github.com/microsoft/vstest/issues/5091 ? cc @nohwnd
i am seeing the abort when I have build logging to detailed, but not when I set it to normal.
i am seeing the abort when I have build logging to detailed, but not when I set it to normal.
This is interesting - and on-par with what @McMlok is also experiencing. However, prior to this bug, my default verbosity level was set to quiet, so there might be some different use cases that ties into logging.
@nohwnd I can probably produce some more detailed logs today - if you tell me the exact dotnet test
command you'd want me to run.
I was able to download the original logs from Cuemon (from the github action, they show that vstest task is running but that is about it).
I also saw this with diagnostic logging yesterday, it says a VSTest task is running but it's hung.
just add --diag:logs/log.txt
to your command and it will produce logs into logs
folder in the current dir. Or you can specify any other relative or absolute path. Please grab all the logs in the folder, not just the one called log.txt
This is probably the same as microsoft/vstest#5091 ? cc @nohwnd
it might or might not be. Hard to say without consistent repro :/
@nohwnd I did yet another CI run .. same result with log level set to normal
.
Test completes on Windows; but does not complete on Ubuntu.
Compared to my workaround, two additional thing to mention:
dotnet test
dotnet test
I hope this help you further. Since you are at Microsoft, maybe you can get the full view here: https://github.com/gimlichael/Cuemon/actions/runs/11697884725
I did use the -bl swtich, but it does not look like its part of the zip file.
thanks for the logs, I don't see any diag logs in there, without them I am pretty blind, I tried yesterday to get them but they did not upload I must have did the paths wrong. I tried to do that on my fork (I am syncing your repro branch to my main). https://github.com/nohwnd/Cuemon/actions/runs/11691191041
thanks for the logs, I don't see any diag logs in there, without them I am pretty blind, I tried yesterday to get them but they did not upload I must have did the paths wrong. I tried to do that on my fork (I am syncing your repro branch to my main). https://github.com/nohwnd/Cuemon/actions/runs/11691191041
I can try to enable the --diag:/home/runner/work/_temp/TestResults/logs/log.txt
, but I am afraid that I am unable to download from GitHub (for instance, detailed logs cannot be donwloaded due to size - another bug on the GitHub platform) ..
Update: the --diag switch did not look like it provided any additional content.
you should be able to upload them as artifacts . I also had better luck clicking the cog wheel and selecting "see raw logs" which opens the whole giant log in browser, which takes some time to load it, but then searches it just fine.
your upload needs to be set to contine after previous step fails btw. (mine was not of course, so I don't have the logs yet)
you should be able to upload them as artifacts . I also had better luck clicking the cog wheel and selecting "see raw logs" which opens the whole giant log in browser, which takes some time to load it, but then searches it just fine.
your upload needs to be set to contine after previous step fails btw. (mine was not of course, so I don't have the logs yet)
I have tweaked it further, though I doubt it will make any changes, as I already put failfast; false (hence, everything is a success). I will download the full log as suggested ans ZIP it.
After a little dicking around, it looks like I need to use always()
as per https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/evaluate-expressions-in-workflows-and-actions#always .. I leave it here for others to be inspired by.
Here are the logs you requested: logs.zip
Thanks for the tip, I was struggling with that as well.
Describe the bug
In some cases,
dotnet test
hangs doing or after a build with .NET9 RC1.Prior to .NET9 RC1, this worked without a problem - for previous versions as well.
Full "log" can be found here (as I was unsure which team to put the issue to): https://github.com/dotnet/core/discussions/9496#discussioncomment-10628380
To Reproduce
Make a CI build in GitHub Actions using Ubuntu and this repo: https://github.com/gimlichael/Cuemon The main branch will fail on these two projects:
If you use this branch instead, https://github.com/gimlichael/Cuemon/tree/v9.0.0/net9rc1-gha-troubleshoot, all .NET9 RC1 updates has been reverted to .NET9 preview 7, and the two test projects mentioned above, runs fine.
Exceptions (if any)
No exceptions, but the test will hang - and for me, timeout after 15 minutes (cause I don't like the built-in timeout of 6 hours or so).
Further technical details
After thorough investigation over the last two days, my conclusion is that something has changed for
dotnet test
with .NET9 RC1.Oddly enough, it works fine on Windows 2022 CI build as well.
Locally, using Windows 11, Docker and WSL (Ubuntu) it works as expected as well.
Workaround for .NET9 RC1 and RC2
Segregate
dotnet build
anddotnet test
.NOK:
dotnet test --configuration Debug --verbosity normal --logger trx --results-directory $RUNNER_TEMP/TestResults --collect:"XPlat Code Coverage;Format=opencover" -p:CoverletOutputFormat=opencover -p:UseSourceLink=true -p:SkipSignAssembly=true
OK: