Closed jetersen closed 2 years ago
Hello, @jetersen. Please provide a small repo how to reproduce the issue.
Well repro was super easy actually.
https://github.com/jetersen/dotnet.restore.slow.github.action
Locally on my machine the cache speeds up the restore to under 1 seconds. Starting dotnet restore also seem relatively slow when called through pwsh shell π°
The scary part is that the slowness is consistent at least according to the dotnet restore timings.
On bigger projects this time scale goes into minutes and we would expect that the cache would speed up the restore significantly.
Well Linux and MacOS does not have this issue.
Ubuntu-20.04 no cache 14s Ubuntu-20.04 with cache 7s MacOS seems to also benefit from the cache with 6s before cache it was 57s.
This is more along the lines of what I would expect.
No improvement between each run on windows with cache...
In log you can clearly see the timescale on windows not making sense but it is consistent.
Determining projects to restore...
Restored D:\a\dotnet.restore.slow.github.action\dotnet.restore.slow.github.action\tests\Library.UnitTests\Library.UnitTests.csproj (in 20.78 sec).
Restored D:\a\dotnet.restore.slow.github.action\dotnet.restore.slow.github.action\tests\Api.UnitTests\Api.UnitTests.csproj (in 20.78 sec).
Restored D:\a\dotnet.restore.slow.github.action\dotnet.restore.slow.github.action\src\Library\Library.csproj (in 2 ms).
Restored D:\a\dotnet.restore.slow.github.action\dotnet.restore.slow.github.action\src\Api\Api.csproj (in 9 ms).
@al-cheb Hope the repro is good enough, we were seeing the similar behavior in our private repo on our unit test project but also on some of our projects with more dependencies. We do restore our internal packages from an internal NuGet source but that would not affect caching.
@jetersen, Looks like it doesn't work only with .net core 5.0.301
. It is recommended to use actions/setup-dotnet@v1
task to pick up version:
- name: Setup dotnet
uses: actions/setup-dotnet@v1
with:
dotnet-version: '5.0.300'
- uses: actions/cache@v2
with:
path:
~/.nuget/packages
key: ${{ runner.os }}-nuget-5.0.300-${{ hashFiles('**/packages.lock.json') }}
restore-keys: |
${{ runner.os }}-nuget-
- name: dotnet restore
run: |
dotnet restore --verbosity n
Good thing is you now have an good repro case to fix it and hopefully it won't reappear in the next dotnet release π
Hi @jetersen, dotnet 6.0 (preview version) also works properly. Looks like this bug affects only the latest patch version of 5.0
@vsafonkin odd, I guess someone else already caught the regression π
@vsafonkin @al-cheb I think we can close it as you mentioned it only affects the latest 5.0.301
patch. I would still like to know what went wrong in the latest patch but not sure it is something we can fix here π
Thank you for investigating the issue.
This is still happening with the latest version of .NET (6.0.x) -- I have projects consistently taking multiple minutes to restore a minimal set of dependencies.
I have noticed the same thing recently.
Just saw a 1 minute restore for very basic test packages: https://github.com/specshell/specshell.software.ndde/runs/4481718755?check_suite_focus=true @al-cheb @vsafonkin should we open the issue elsewhere is this better suited for dotnet org?
My latest test shows that it's not just dotnet restore
, but almost the entire windows-latest
environment is at least an order of magnitude slower than the exact same workflow running on ubuntu-latest
(diff).
Step | Ubuntu run | Windows run |
---|---|---|
Checkout | 1 s | 12 s |
Install .NET | 13 s | 64 s |
Restore | 8 s | 120 s |
Build | 4 s | 8 s |
Test | 6 s | 7 s |
Total | 34 s | 216 s |
The build and test steps are close enough, but all of the actions that require some sort of network transfer are extremely slow. Is this a known issue with Windows environments, and if so, can something be added to documentation somewhere to indicate that the huge performance hit is expected?
@SteveDesmond-ca nice comparison, I can definitely attest to this based on our private repos.
Ya, okay now the issue is no longer dotnet restore but networking is a significant issue in the test case I built:
action: skip telemetry and other slow downs
Even in the update dependencies action restoring new dependencies is slow on windows. Ubuntu VM spends 5 seconds Windows VM spends 14 seconds.
So network seems to be a issue.
Checking the output of the cache as it includes download speeds something is off: zstd is definitely also a benefit as you can see by the cache size.
Ubuntu
Received 50331648 of 138545810 (36.3%), 48.0 MBs/sec
Received 134351506 of 138545810 (97.0%), 64.0 MBs/sec
Received 138545810 of 138545810 (100.0%), 59.0 MBs/sec
Cache Size: ~132 MB (138545810 B)
Windows
Received 0 of 190559914 (0.0%), 0.0 MBs/sec
Received 96468992 of 190559914 (50.6%), 45.6 MBs/sec
Received 186365610 of 190559914 (97.8%), 58.9 MBs/sec
Received 190559914 of 190559914 (100.0%), 44.9 MBs/sec
Cache Size: ~182 MB (190559914 B)
Sometimes ubuntu seems a lot faster at download:
Received 138545810 of 138545810 (100.0%), 120.4 MBs/sec
Cache Size: ~132 MB (138545810 B)
Could this be fixed tweaking by TcpAckFrequency and TcpNoDelay on the windows VM?
Perhaps update the issue title: slow network transfer on Windows agent
Perhaps worth reconsidering #4424 to have DotNet 6.0 installed as it is the latest LTS.
I had cases when restore took a long time when nuget.org was not available to check the signatures. Setting env var NUGET_CERT_REVOCATION_MODE to offline solved that particular problem. Maybe it is the same issue? It will force NuGet to check the revocation status of the certificate only against the cached certificate revocation list, and NuGet will not attempt to reach revocation servers.
https://docs.microsoft.com/en-us/nuget/reference/errors-and-warnings/nu3028
@zcsizmadia well as we just discussed in the last two comments is that restore is not really the issue but rather the networking seems to be a order of magnitude slower.
@vsafonkin @al-cheb the slowness seems to affect all the windows VM: https://github.com/jetersen/dotnet.restore.slow.github.action/actions/runs/1568508906
Although windows 2022 seems to have a little faster tar.exe which is good I guess π
I am not saying we should fix windows 2016 but at least look at 2019 and 2022. And figure out why they are SO much slower than ubuntu or even macos. Usually MacOS is only 10 seconds slower but I have seen the agents flake sometimes and become slow as you can see in the screenshot below.
I think I found the culprit on windows it is the System.Net.HttpClient: https://github.com/jetersen/dotnet.restore.slow.github.action/blob/main/.github/workflows/ci.yml
Note I tried to switch from use of powershell to pwsh with no luck. It is still the System.Net.HttpClient being slow.
Here I tried using curl.exe which seems to be able to download it 2-3x faster.
I'm starting to look at timestamps in various actions, which is leading me to believe this is actually a disk I/O issue, not a network I/O issue.
@SteveDesmond-ca care to explain the speed difference in curl.exe and tar.exe than? π
Turns out it's both! Here is a minimal repro that doesn't even use .NET at all: downloading dependencies via yarn
(for a real-world medium-sized app) has the same discrepancies between ubuntu-latest
and windows-latest
https://github.com/SteveDesmond-ca/gh-actions-windows-slow/actions/runs/1574970778 | OS | Shell | Startup | Download 500 tgz |
Create 1K dirs | Copy 12K files |
---|---|---|---|---|---|---|
ubuntu-latest |
bash |
0.4 s | 8.8 s | 2.1 s | 3.5 s | |
ubuntu-latest |
pwsh |
0.4 s | 11.0 s | 1.8 s | 4.9 s | |
windows-latest |
bash |
11.1 s | 85.4 s | 2.3 s | 4.4 s | |
windows-latest |
pwsh |
10.0 s | 99.9 s | 3.9 s | 6.0 s |
As you can see, there are also compounding performance issues when using pwsh
(the default on windows-latest
) vs bash
-- this may also contribute to @jetersen's remark that windows-2022
is slightly faster, since pwsh
on windows-2022
is probably running a newer (faster) version of .NET under the hood.
I don't think it is specifically pwsh
that is the issue since I have several .NET repos using pwsh
and ubuntu-latest
with no restore penalty.
Clearly something on the VMs are configured incorrectly which causes terrible Network and Disk I/O operation.
misclick π
Timings from yesterday around 14:00 CET. Shows that this is still an issue.
@jetersen, we are still investigating it, actually I'm confused that downloading via curl has very similar performance. For example, powershell script (download ~200 Mb 200 times):
For ($i = 0; $i -lt 200; $i++) {
$outfile = "dotnet-$i.zip"
curl https://dotnetcli.azureedge.net/dotnet/Sdk/5.0.403/dotnet-sdk-5.0.403-win-x64.zip -o $outfile
rm $outfile
}
and result:
@jetersen , @SteveDesmond-ca , Could you check a restore step with params?
- name: Restore deps
run: |
dotnet new nugetconfig
dotnet restore -v n --packages D:\pkgs
@al-cheb that does not resolve the fact that downloading .NET 6 is also slower. The issue title has changed to be generic: Network and Disk IO in general seems slower on the windows agent. Where curl on windows for some reason does not have this issue.
@al-cheb that does not resolve the fact that downloading .NET 6 is also slower. The issue title has changed to be generic: Network and Disk IO in general seems slower on the windows agent. Where curl on windows for some reason does not have this issue.
Ubuntu agents have slightly higher IOPS disk performance configuration. We use install-dotnet.ps1 script for installation provided by DotNet team . The DownloadFile
and Extract-Dotnet-Package
functions are being slow. We will investigate how to improve performance those functions if we replace DownloadFile
-> WebClient
and Extract-Dotnet-Package
-> 7zip
@al-cheb https://github.com/jetersen/dotnet.restore.slow.github.action/commit/f342429d66b14ce848b79cd59733e00bd9536cb1 does indeed help with NuGet restore. I used a different approach to what you mentioned.
@jetersen, how we see this issue is not related to network configuration of the Windows runner. We have tested Windows image from Azure Marketplace and got the same network performance. Looks like the problem appears from .NET side because we see performance degradation with newer versions of .NET(for example restore on .NET 5.0 is slower then restore on .NET 3.1). Also there is an issue with precached Nuget packages placed on C:\Program Files (x86)\Microsoft SDKs\NuGetPackages\
directory on the runner: restoring packages from cache is more slowly then downloading from nuget.org, so default nuget config (command dotnet new nugetconfig
) contains only nuget.org feed and has better performance. Probably it's related to IO disk operations and looks like there is nothing we can do from our side unfortunately.
As @al-cheb mentions above Ubuntu agents have slightly higher IOPS disk performance configuration.
I think dotnet new nugetconfig
is a fine solution perhaps some of these things should be documented in the setup-dotnet
action or somewhere it makes sense.
Of course there is still the issue with setup-dotnet
being slow to install .NET SDK
Of course there is still the issue with
setup-dotnet
being slow to install .NET SDK
Yes, but to install the .NET SDK, the setup-dotnet
action uses scripts maintained by DotNet team as the recommended approach and we also cannot to do anything with it.
https://docs.microsoft.com/en-us/dotnet/core/tools/dotnet-install-script
Closed, because there is nothing we can do from our side.
Created https://github.com/actions/setup-dotnet/issues/260 π
We have I/O perf issues as well, and it's not related to .NET.
For example, when using the actions/cache@v3
Action to restore something from cache. The cache contains a lot of small files that are then unpacked.
Windows:
It took Windows 428 seconds to restore 1337MB.
Linux:
Although Linux had to restore only 1074MB, it took only 34 seconds.
As you can see, Windows was actually downloading files from the cache faster than Windows. However, I suspect that what took the longest was extracting them: to us, it seems to be more of a disk I/O issue rather than a network one.
This is one big example, but we see Windows agents being much slower in other things that perform disk I/O, such as pulling/pushing Docker images.
@ItalyPaleAle windows and ubuntu runners are different in terms of disk allocation and that may explain the issue.
@miketimofeev the performance different is very significant however. I don't know what disks are used, but Linux agents processed 1000MB in 32s and WIndows ones in 320s in our tests above, so 10x slower. (I am aware that it's not apples-to-apples as the files in the caches aren't exactly the same, but this difference should be significant regardless)
This is very frustrating, the nuget downloading was slow on windows, even timing out sometimes, so to make it faster, and more reliable I wanted to use the cache, only to find out that the cache action is also very slow on windows. https://github.com/space-wizards/space-station-14/runs/7017334343?check_suite_focus=true The windows server took 2 minutes, and 29 seconds to restore the compressed file, and the Ubuntu server took 16 seconds. Is there a way to work around this?
Unfortunately, this is still a problem as of July/2023. The dotnet restore
command on ubuntu-latest
runs in seconds, but for both windows-2019
and windows-latest
it takes a minute or more. Is there a fix for this? The problem actually costs us money because of per-minute billing π
I've got a pipeline that I run against Linux, Mac and Windows to test cross platform-ness. The Linux and Mac ones take a few minutes. The windows one can take 7, 8 or 9 minutes.
Yeah, windows is still like 11 minutes slower then linux for us. https://github.com/space-wizards/space-station-14/actions/runs/6272151167/job/17033126765
I also see similar timings: https://github.com/CoderLine/alphaSkia/actions/runs/6441066466
My build times are overall significantly slower in Windows. A dotnet build
(with restore) of my Nuke build project takes 15secs on ubuntu-latest
and >1min on windows-latest
. But also other compilations take significantly more time.
Description
Actually we are seeing the same behavior on GitHub actions running a shell command for
dotnet restore
takes a very long time on windows even when using actions/cache πOriginally posted by @jetersen in https://github.com/actions/virtual-environments/issues/1733#issuecomment-860123620
I cannot replicate locally where a restore with a full nuget cache it takes less than a second.
Area for Triage:
.NET Core
Question, Bug, or Feature?:
Bug
Virtual environments affected
Image version
Expected behavior
actions/cache should speed up dotnet restore to take a few seconds
Actual behavior
Dotnet restore even with actions/cache takes well over 30 seconds.
Repro steps
https://github.com/jetersen/dotnet.restore.slow.github.action