actions / runner-images

GitHub Actions runner images
MIT License
10.15k stars 3.06k forks source link

Windows Server 2022 builds are taking 4x longer than on Windows Server 2019 #5166

Open xt0rted opened 2 years ago

xt0rted commented 2 years ago

Description

Over the last 2 weeks I've noticed runs on the windows-2022 image taking 4x longer, and sometimes more than that, than on the windows-2019 image. In all of these instances nothing's changed between runs aside from the version of Windows being used.

One of the examples I'm looking at from about 2 weeks ago took 7 minutes to complete, while a run from a day or two ago took 24 minutes. This increase in time is consistent across all new runs in our org.

In another repo I downgraded our build scripts & vm to Windows Server 2019 and the time went from 19 minutes down to about 5.5 minutes. I'm unable to permanently move back to 2019 though because new builds depend on VS 2022.

worfkflow

While testing something else I ran a simple checkout & build of an empty .net project and the build times for ubuntu-latest (25s) and windows-2019 (1m 12s) were about what I'd expect, while the windows-2022 image clocked in at 8m 14s.

All of this was originally reported to support in ticket 1521042 but I was told to open an issue here instead. That ticket has org/repo names and links to each run.

Virtual environments affected

Image version and build link

None of the repos are public but these are the Run Ids for each.

Runs in the image:

Run VM Version Time
1860104886 windows-2019 20220207.1 6m 51s
1912616149 windows-2022 20220220.1 23m 59s

From another similar repo:

Run VM Version Time
1912952429 windows-2019 20220223.1 5m 25s
1912601548 windows-2022 20220220.1 19m 8s

Test repo:

Run VM Version Time
1912879771 ubuntu-latest 20220220.1 25s
1912879232 windows-2019 20220223.1 1m 12s
1912864061 windows-2022 20220220.1 8m 14s

Is it regression?

No response

Expected behavior

For run times on par with Windows 2019

Actual behavior

Run times up to 4-5x longer than Windows 2019

Repro steps

Run a .net full framework build on Windows 2019 and 2022, the 2022 runs should take significantly longer.

xt0rted commented 2 years ago

My build times just dropped to 5-10 minutes. All that changed was I downgraded the .net 6 sdk to .net 5. I doubt that's related, but I am seeing times on par with what I used to get.

miketimofeev commented 2 years ago

@xt0rted hi! Is it possible for you to provide the minimal steps to reproduce the issue? We can make some experiments if we have reproducible examples

xt0rted commented 2 years ago

@miketimofeev I can't reliably reproduce this. I see the issue in a couple of my private repos, but it seems to come and go now. It's only happening with the Windows 2022 workflows though, downgrading to 2019 where possible fixes it and the build times remain consistent. I'll keep seeing if there's a way to reproduce this though.

About 2 weeks ago I had a build on windows-latest which was taking anywhere from 6-10 minutes, switched to ubuntu and it dropped to 58 seconds. I know windows takes longer but it shouldn't be that much longer. I can dig up some details on that if you'd like, but the only change made at that point was switching the OS.

mikhailkoliada commented 2 years ago

@xt0rted Hello! Thanks, we will take a look!

miketimofeev commented 2 years ago

@xt0rted I'm afraid we can't proceed with the investigation without a reproducible example. We don't have access to the tickets, I'm afraid.

xt0rted commented 2 years ago

Not sure if it's related, but here's an instance where the Windows 2022 build took 3x longer than the Windows 2019 build https://github.com/xt0rted/dotnet-rimraf/actions/runs/2041805895.

I don't have a Windows 2019 build to compare to for this one, but you can see the Windows 2022 build is significantly longer than the macOS or Ubuntu builds https://github.com/xt0rted/dotnet-run-script/actions/runs/2041939525.

rilysh commented 2 years ago

Can tell that, I having the same issue as it is. Using Windows Server 2022, builds taking quite a longer time than 2019.

lowlydba commented 2 years ago

My Windows 2022 builds are so slow that they're repeatedly timing out and/or having issues provisioning resources. Works fine with 2019.

https://github.com/lowlydba/lowlydba.sqlserver/actions/runs/2458069276

al-cheb commented 2 years ago

@lowlydba, Is it possible to replace bash shell to shell: "wsl-bash {0}" for Windows Server 2022 in your CI?

lowlydba commented 2 years ago

D'oh! I'm curious how that worked in the first place. Unfortunately, after fixing it still is erroring out.

al-cheb commented 2 years ago

D'oh! I'm curious how that worked in the first place. Unfortunately, after fixing it still is erroring out.

I am able to reproduce this issue on my self-agent. Currently we are getting BSOD on Windows Server 2022 with WSLv1. We are planning to investigate if we could migrate to WSLv2 from WSLv1. I will let you know as soon as I find something.

dmitry-shibanov commented 2 years ago

Hello @xt0rted. Sorry for the late response. Does the issue reproduce with new images ?

xt0rted commented 2 years ago

@dmitry-shibanov I haven't been working on the projects where I first encountered this (the ones in the original screen shots) but I do still see slower Windows 2022 times in one of my public projects. Windows is always the slowest, and in some cases by 6x or more for a pretty simple build & test workflow.

With the announcement of larger runners are there any specs available for what type of hardware they're using? Specifically how does i/o compare on the larger runners vs. the existing ones?

xt0rted commented 2 years ago

@dmitry-shibanov I'm revisiting this and testing Actions vs. DevOps at work. Here's another very consistently reproducible example of the difference in Windows 2022 times vs. other platforms.

image

The Ubuntu runs are failing due to a unit test, but they're running to completion so it's still a fair comparison, and fixing the test doesn't yield different results.

While working on this it was pointed out to me that since it costs 2x per minute for the Windows runners, and they take 2-4x longer to run, that in the end you're paying 4-8x for Windows over Ubunutu in this example. That's pretty ridiculous and makes this a really hard sell over sticking with a larger custom VM on DevOps.

Piedone commented 2 years ago

For those interested, I did some investigation of various performance optimization options (NTFS settings, virtual drives with different file systems), including the necessary scripts and measurements here: https://github.com/Lombiq/GitHub-Actions/issues/32 Spoiler alert: I didn't find anything that would be possible to use and would help.

AHuusom commented 2 years ago

Does it have anything to do with .NET Core SDK 6.0.401 only being installed on 2022 and not on 2019?

Shane32 commented 1 year ago

Windows runners are always painfully slow. I use both Windows and Linux runners a lot, and Windows is always much much slower. Take a look at this public repository's workflows -- click on any run:

https://github.com/graphql-dotnet/server/actions/workflows/test.yml

You'll see that runs on Windows always take nearly double the time. Here's another repository:

https://github.com/graphql-dotnet/graphql-dotnet/actions/workflows/test-code.yml

Sometimes the Windows runners do run similar to Linux runners. But more often than not they are nearly twice as slow. For example see this run -- 6 min on Linux vs 12 min on Windows -- and note that the Ubuntu workflow has more build steps!

https://github.com/graphql-dotnet/graphql-dotnet/actions/runs/4791811277/jobs/8522652049

While working on this it was pointed out to me that since it costs 2x per minute for the Windows runners, and they take 2-4x longer to run, that in the end you're paying 4-8x for Windows over Ubunutu in this example. That's pretty ridiculous and makes this a really hard sell over sticking with a larger custom VM on DevOps.

Totally agree. It would at least be better if it ran equally as fast and they just charged 4x.

Danielku15 commented 1 year ago

I also can see a significantly worse performance on Windows agents in my new project where I build a in a quite wide matrix of configurations: https://github.com/CoderLine/alphaSkia/actions/workflows/build.yml

Just some numbers from my pipeline showing the differences:

Or also here: image

When looking into the steps they all take a significance longer on Windows. Cloning, building, artifacts, CPU tasks.

alexciarlillo commented 6 months ago

This may just be anecdotal but my hope is maybe it spurs an idea for someone or maybe some other person searching for answers will see this. We run CI tests which detect hotkey presses from a native node module. By design when the native module detects an event is passes the event to a JS thread's callback to handle it using libuv. When we moved to windows-2022 these stopped functioning. I tracked it down to extreme delays in libuv's uv_send_async. The targeted threads were not waking up to receive events for 10+ seconds. Since we require VS 2022 for our newer builds we were forced to go with a self hosted agent running Win 11 for now which does not exhibit this issue. The issue was also not present in Win 10 builds we tried and has not appeared on any of our clients in the wild. I had to stop digging into the issue once we had a workaround but it seems like overall performance issues on these machines could be tied to the behavior I was seeing.