Open EgorBo opened 2 years ago
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.
Tagging subscribers to this area: @directhex See info in area-owners.md if you want to be subscribed.
Author: | EgorBo |
---|---|
Assignees: | - |
Labels: | `untriaged`, `area-Infrastructure-mono` |
Milestone: | - |
Adding @SamMonoRT
Yes, these are very slow, they run opt+llc on unlinked assemblies.
This test run is pretty regularly timing out -- can we get someone to investigate if the slowness is a bug, or if we need to adjust the timeout?
This test run is pretty regularly timing out -- can we get someone to investigate if the slowness is a bug, or if we need to adjust the timeout?
This PR (https://github.com/dotnet/runtime/pull/66157) should help ease the timeouts seen in last couple weeks. Even with that fix, the lane is 2.5+hrs long. Still discussing this, but we might possibly 1. want to exclude certain long running tests as part of PR runs in this lane, 2. Extend the timeout to stabilize CI in the short term
@SamMonoRT which PR?
Looks like that resolved the problem. I'm going to close this out for now.
It doesn't look fixed to me, every time this job is triggered it takes 4-5 hours, e.g. https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_apis/build/builds/146722/logs/1353 (from https://github.com/dotnet/runtime/pull/81094)
and since it's not an optional pipeline I think it's either has to be moved to be so or not all of the tests have to be precompiled with AOT.
I've wrote a quick parser for the output (for today's PR ^) and sorted assemblies by the time it takes to run LLVM (opt and llc) for them:
E.g. just by moving AdvSimd tests alone to an outerloop pipeline we can save ~30 minutes (4 dlls)
E.g. just by moving AdvSimd tests alone to an outerloop pipeline we can save ~30 minutes (4 dlls)
I think I'd rather move the whole thing out and then analyze what we can run per PR.
@EgorBo thanks for putting together the updated list!
Wanted to mention that we should be careful to leave enough testing on PRs to reliably catch failures introduced by adding new Jit tests. In my experience these are not uncommon.
@kotlarmilos @vitek-karas - not sure if this is something your team owns now and what more remains here? Please can you re-assign as appropriate.
I'll take this as it likely has to do w/ the aot compiler performance itself.
My general philosophy is, "PR is for fast reliable tests" so I agree with the theory of moving everything out, then moving things back in that meet that criteria. Ideally we can find the sweet spot of fast + high confidence in finding bugs.
Mono llvmfullaot Pri0 Runtime Tests Run Linux arm64 release
takes around 2.5H to finish.There are some interesting anomalies in the logs, e.g.: (I checked various runs)
It says that prejitting of a single managed assembly
Microsoft.Win32.SystemEvents.dll
takes almost 10 minutes 😮 (mostly in LLVM's opt+llc)I parsed the output into an excel table:
is it possible to move some libs/tests to the outerloop? e.g.
JIT/Methodical/MDArray/GaussJordan/classarr_cs_do/classarr_cs_do
test. And I guess we need to figure out what exactly makesMicrosoft.Win32.SystemEvents.dll
so long to prejit - there are not much stuff in it.cc @akoeplinger @vargaz @steveisok