Closed AaronVanGeffen closed 2 years ago
Hello @AaronVanGeffen , do you have nuget.config
file in your repo?
I think it is known issue when you don't have config file.
Could you please try to add simple nuget.config
like described in the message: https://github.com/actions/setup-dotnet/issues/155#issuecomment-761195782
Hello @maxim-lobanov, thank you for the suggestion. Indeed, there was no nuget.config
file in our repository. Previously this worked fine, so I was not aware we would need any. However, adding the config file as you suggested appears to have helped: https://github.com/OpenLoco/OpenLoco/pull/855
Cool, glad to hear that it helped.
Looks like sometimes dotnet doesn't resolve packages from remote when doesn't find packages in local cache. I think It is not the new issue since based on https://github.com/actions/setup-dotnet/issues/155#issuecomment-761195782, it fails pretty randomly without nuget.config file.
The root cause is still unclear but using nuget.config
is recommendation from NuGet team to deal with this issue: https://github.com/NuGet/Home/issues/10586#issuecomment-783689013
Thank you for the explanation. Just to confirm, CI has been working reliably tonight with the nuget.config
file in.
@AaronVanGeffen thanks for the confirmation! I'm going to close the issue but feel free to contact us if you have any concerns.
I bumped into same issue. While workaround does work i noticed that PATH printed by setup-dotnet
action differs on failing and succeeding builds. Failing builds have C:\hostedtoolcache\windows\Java_Adopt_jdk\8.0.282+8\x64\bin
in the PATH
and succeeding builds have C:\Program Files\Java\jdk8u282-b08\bin
. I suspect not all build workers use same image even if all of them use windows-latest
.
@rokups it takes 3-4 days to propagate the new image (with Java in the hostedtoolcache directory) to all the environments. We're going to finish the deployment on Monday
Hello, I believe this issue should be better investigated and (hopefully) solved without the need of wrapping our own nuget.config file. The NuGet config should be the same across different instances of Windows runners and I would also expect them to match Mac and Linux (which does not suffer from this problem) runners;
It sounds reasonable to have nuget.org source enabled by default if any packages are not found in local cache (which makes sense for the runtime environments not to keep downloading packages form NuGet.org every run).
Note: I have two different runs in the same 20210330.2 environment version, one failing and another succeeding. It really looks like runners are not being properly cleaned up after a previous run.
@fabriciomurta no, it's not possible — every run is performed on a clean agent
@miketimofeev @maxim-lobanov Can you please reactivate this issue and help us get to a resolution? The core NuGet issue NuGet/Home#10586 was closed without a resolution other than "clean the nuget cache" or "have a nuget.config" which don't seem productive, since restoring projects used to work fine before the last AzDO pipelines/GHA updates. You're saying it's not possible that this is due to non-clean state on the agent (which I'd agree with), but nuget folks are implying the contrary.
Both this and the NuGet issue are closed and we need someone to step up in either of the teams and provide a solution. We have multiple repos - both internal and external - that got broken since the latest update.
@asklar reopened. Could you please share some repo where the issue persists? We've tried to reproduce last time without any luck
You may use https://github.com/rokups/rbfx if it is not too fat. Delete nuget.config
from root folder to make it fail.
@miketimofeev thanks. We're hitting this in https://github.com/microsoft/react-native-windows/ among others
This is random, so to troubleshoot you should add a step to pinpoint the
host of the runner (the actual computer) and schedule it to run in any of
the failing sample repos provided, say every 30 min until it fails. I
suspect dotnet nuget list source
should show a different output in the
affected runners virtual instances (that's my bet with the action I made)
On Wed, Apr 7, 2021, 4:49 AM Alexander Sklar @.***> wrote:
@miketimofeev https://github.com/miketimofeev thanks. We're hitting this in https://github.com/microsoft/react-native-windows/ among others
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/actions/virtual-environments/issues/3038#issuecomment-814687988, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACR3XBDJ35NWNZ4VM54SEC3THQE77ANCNFSM4Z4CSCXQ .
@fabriciomurta According to https://github.com/NuGet/Home/issues/10586#issuecomment-809998961, running dotnet nuget list source
actually fixes the issue on the runner.
@miketimofeev Hitting it here too but it's not random https://github.com/Ryujinx/Ryujinx/ (Example of a failing PR run: https://github.com/Ryujinx/Ryujinx/runs/2287451767)
I have consistently received failures this morning because of this. I am also using windows-latest. Listing the nuget sources, only the offline one is register. I had to manually register nuget.org to get it to work.
@jantoineqci this really supports my suspicion the default public nuget source is not being set up and is exactly what the action I made seeks to address. Glad to know it is (very) likely to fix it.
About the 30-min tests I suggested above, it seems somebody already set up a 30-min test in a repo (https://github.com/actions/virtual-environments/issues/1090#issuecomment-814556942), and it is not hitting the problematic instances. So being assigned to those nuget-source-less instances may be related to the subscription or actions usage demand of the repository.
Another hint is that this been happening for some time in Azure DevOps environment, according to this comment: https://github.com/actions/virtual-environments/issues/1090#issuecomment-751354009
@database64128 , About "any dotnet nuget
command fixes the issue" (https://github.com/NuGet/Home/issues/10586#issuecomment-809998961), the following comment (https://github.com/NuGet/Home/issues/10586#issuecomment-810264508) refutes the theory.
@fabriciomurta @jantoineqci we're working on a fix now to have the default source presented in nuget.config
@miketimofeev Thank you! how long will the fix take to propagate? I will put a continue-on-error on my add source step so the fix doesn't break my build again.
@jantoineqci I hope we will start the deployment on Monday and it will take 3-4 days if nothing goes wrong 🤞
Can the previous working Windows image be restored? It broke us too. I'm pretty sure others will notice this.
@ArchieCoder, could you try dotnet nuget add source https://api.nuget.org/v3/index.json -n nuget.org
as first step of workflow? It should be work as temporary workaround.
@vsafonkin I have this error:
error: The name specified has already been added to the list of available package sources. Provide a unique name.
I added this line in my workflow:
@ArchieCoder could you provide the output from dotnet nuget list source
?
@vsafonkin It is in the link 12_Hack.txt, sorry it was not obvious in my previous post
@miketimofeev thanks good to hear - note that in our case our projects are not using .net core 5 (nor the dotnet CLI).
They're either C++, or C# UWP, yet they hit the same issue. We restore the project during msbuild by passing /restore
(and /p:RestorePackagesConfig=true
for C++ apps)
@asklar does it mean adding nuget.org as a source doesn't help in your case? dotnet nuget add source https://api.nuget.org/v3/index.json -n nuget.org
@miketimofeev we already have nuget.org in our nuget.config files. Even when the packages fail to restore, at the end it lists the sources and nuget.org is in there:
2021-04-07T00:46:42.6006157Z D:\a\1\s\packages\e2e-test-app\windows\ReactUWPTestApp\ReactUWPTestApp.csproj : error NU1101: Unable to find package Microsoft.UI.Xaml. No packages exist with this id in source(s): Microsoft Visual Studio Offline Packages [D:\a\1\s\packages\e2e-test-app\windows\ReactUWPTestApp.sln]
...
2021-04-07T00:49:50.4499469Z NuGet Config files used:
2021-04-07T00:49:50.4500599Z C:\Users\VssAdministrator\AppData\Roaming\NuGet\NuGet.Config
2021-04-07T00:49:50.4501691Z C:\Program Files (x86)\NuGet\Config\Microsoft.VisualStudio.FallbackLocation.config
2021-04-07T00:49:50.4502788Z C:\Program Files (x86)\NuGet\Config\Microsoft.VisualStudio.Offline.config
2021-04-07T00:49:50.4503733Z C:\Program Files (x86)\NuGet\Config\Xamarin.Offline.config
2021-04-07T00:49:50.4504944Z D:\a\1\s\vnext\NuGet.Config
2021-04-07T00:49:50.4505686Z
2021-04-07T00:49:50.4506486Z Feeds used:
2021-04-07T00:49:50.4507369Z C:\Program Files (x86)\Microsoft SDKs\NuGetPackages\
2021-04-07T00:49:50.4508797Z https://pkgs.dev.azure.com/ms/react-native/_packaging/react-native-public/nuget/v3/index.json
2021-04-07T00:49:50.4511047Z https://api.nuget.org/v3/index.json
2021-04-07T00:49:50.4511784Z
2021-04-07T00:49:50.4512987Z Installed:
2021-04-07T00:49:50.4553915Z 87 package(s) to D:\a\1\s\vnext\Microsoft.ReactNative.Managed.CodeGen\Microsoft.ReactNative.Managed.CodeGen.csproj
2021-04-07T00:49:50.4555679Z 24 package(s) to D:\a\1\s\vnext\Microsoft.ReactNative.Managed\Microsoft.ReactNative.Managed.csproj
2021-04-07T00:49:50.4599808Z Done Building Project "D:\a\1\s\packages\e2e-test-app\windows\ReactUWPTestApp.sln" (Restore target(s)) -- FAILED.
it's as if nuget is requiring that a package must exist on all sources? some packages won't exist on the local sources, some might even be in private feeds (like our Azure Artifacts feed), so this seems like a bad assumption on nuget's part? CC @rainersigwald @rrelyea in case this looks familiar
For the time being, using my little action has avoided the issue when it should happen:
- name: Ensure NuGet Source
uses: fabriciomurta/ensure-nuget-source@v1
Basically the action will ensure there is a nuget source (regardless of the name) pointing to https://api.nuget.org/v3/index.json; if not, it will add/update nuget.org
source pointing to that.
So whenever you kick in a broken runner it will just fix the source for you.
I have published the action to GitHub marketplace: https://github.com/marketplace/actions/ensure-nuget-source
So the action step may sit in your workflow and shouldn't break the CI process even after the actual fix is implemented.
@miketimofeev about https://github.com/actions/virtual-environments/issues/3038#issuecomment-814979307, if we just add a step to run that command, then CI will fail when it hits a correct runner, because the source would already exist. So should at least ignore success/failure of the step if want the step not to break workflow.
@fabriciomurta , thank you for sharing, good point!
We are trying to understand the root cause of this issue. It exists for some time but reproduce rarely. Looks like it started to happen more often with latest updates but nothing obvious on images that could cause it except VS update.
To me it is like the actual host (the physical machine hosting the runners) is providing the broken default nuget configuration. So without enforcement from the virtual environment template, the built virtual machine is just inheriting whatever's in its host, instead of ensuring the NuGet sources include the public one.
A hint of what I'm stating is, the macos and linux hosts only have the nuget.org
entry, and not that Microsoft Visual Studio Offline Packages
one. This is easy to see in the unit tests run by the action I written: https://github.com/fabriciomurta/ensure-nuget-source/runs/2287645659?check_suite_focus=true. Of course, it may be the case that dotnet/nuget/we installs also that source by default only for windows systems, so it is just a chance.
From the link above, see Test 1 in each platform; for macos and ubuntu, I needed an extra step to add a mock source due to another issue that does not allow me to remove all NuGet sources (in a check I remove the nuget.org
default source to ensure the action adds it back)
note: in the action run above I am just highlighting the different nuget sources between platforms; in that action run the windows host got the correct NuGet source, so it didn't hit one of the affected runner environments; in fact I couldn't hit it in my side repository; it seems to like big repos :)
My point is that nuget is erroring out because it didn't find one of the packages in the "VS offline packages" feed, which should not be an error, it should keep trying the other sources; if it gets to try the nuget.org source it will find them.
I tried the workaround on our build and we are getting the following with the workaround:
"The name specified has already been added to the list of available package sources. Provide a unique name."
@asklar I agree with your point; per your logs, you are facing a little different issue than I am.
The issue I have is consistent with @jantoineqci (https://github.com/actions/virtual-environments/issues/3038#issuecomment-814904484) where the NuGet source is not set up at given windows-latest
runs.
In your case it seems the sources are correctly set up yet the NuGet packages are not found, for some reason.
FYI @eekamouse @vsafonkin
Thanks @fabriciomurta, your fix works. Is it safe to keep it even after msft fixes the issue?
@ArchieCoder yes, if the NuGet source is there, the step is not doing anything, so it should be safe to be kept with or without whatever fix is implemented in this issue.
It should only potentially break CI runs if the official public NuGet repository URL stops responding at https://api.nuget.org/v3/index.json. (because it ensures you have at least one source pointing to that URL; it will add a new one if no nuget.org source exists, or update it if it points elsewhere -- that's where it could break if public NuGet URL address changed).
FYI @eekamouse @vsafonkin
Thanks @fabriciomurta, your fix works. Is it safe to keep it even after msft fixes the issue?
- name: Ensure NuGet Source uses: fabriciomurta/ensure-nuget-source@v1
Ah. Perfect. Ya I just switched the command to do a list source and that did the trick.
@eekamouse not really... See this comment: https://github.com/actions/virtual-environments/issues/3038#issuecomment-814904484
It wouldn't fix in this case if I was just doing the list command. The action does list for diagnostics but in case the source is not there (which happened for the comment linked above), it will be added. Per the various reports around the subject we have three issues:
The action also checks if the source is set up but is disabled. I never hit this scenario; but if exists and disabled, it will then re-enable the NuGet source. All using the dotnet nuget
command (so may not work in environments not having dotnet installed -- is there any windows runner without dotnet
installed? :smile:)
We have another report, which looks like @asklar case. The only difference we've found in the logs so far is the MSBuild version.
Failed:
2021-04-06T19:39:24.2927008Z MSBuild auto-detection: using msbuild version '16.9.0.16703' from 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\MSBuild\Current\bin'. Use option -MSBuildVersion to force nuget to use a specific version of MSBuild.
Working:
2021-04-03T11:56:32.1196795Z MSBuild auto-detection: using msbuild version '16.9.0.11203' from 'C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\MSBuild\Current\bin'. Use option -MSBuildVersion to force nuget to use a specific version of MSBuild.
For that particular customer changing .net version from 3.1 to 5 solves the issue
In our case we are not using neither .net core 3 nor .net 5, we are using .net UWP.
These problems are also affecting Azure Pipelines. AFAIK that the same machines are used under the hood.
Error NU1101: Unable to find package Microsoft.NETFramework.ReferenceAssemblies. No packages exist with this id in source(s): Microsoft Visual Studio Offline Packages
@asklar, @ArchieCoder, @fabriciomurta, @pellared, could you try this step as workaround?
dotnet nuget add source https://api.nuget.org/v3/index.json -n nuget.org --configfile $env:APPDATA\NuGet\NuGet.Config
@vsafonkin Your fix work AND "uses: fabriciomurta/ensure-nuget-source@v1" also works
Hello @AaronVanGeffen , do you have
nuget.config
file in your repo? I think it is known issue when you don't have config file. Could you please try to add simplenuget.config
like described in the message: actions/setup-dotnet#155 (comment)
This works as well
@vsafonkin I've been looking up our windows-bound workflows in the last ~10 runs or so; we didn't hit the problematic environment, so I myself can't really tell the workarounds are fixing anything.
It may be the case, as pointed at https://github.com/actions/virtual-environments/issues/3038#issuecomment-814857550, that I am hitting that scenario, and simply by checking if nuget sources beforehand is "de-triggering" the issue. But I am a little skeptical about that, I just think I didn't hit the jackpot yet. Our project was not deterministically and insistently hitting the same instance.
Actually, I noticed if the issue triggered and I re-run the job, it was falling again on the broken instance. Initiating a new action run (via another workflow_dispatch, push, etc), was throwing me to an ok instance. But again, may have been coincidence.
btw thanks for you all for the effort on fixing the issue and providing feedback! I will make sure to post an update here if I can find anything else that could help.
@asklar, @ArchieCoder, @pellared, @eekamouse, @rokups, @fabriciomurta, could you please try another workaround for your builds?
Remove-Item $env:APPDATA\NuGet\NuGet.Config
The cause of the issue is empty config file in user' appdata. We want to make sure the deletion this file works too. Thank you!
@vsafonkin You can make a fork from https://github.com/open-telemetry/opentelemetry-dotnet-instrumentation/commit/716974d3050afbb0d9cfa2310acfb4229ac6cb1f (commit before the workaround was applied) and experiment yourself 😉
Description
For the OpenLoco project, we gratefully make use of GitHub Actions for our CI. To this end, we set up a workflow for our Windows CI: https://github.com/OpenLoco/OpenLoco/blob/master/.github/workflows/ci.yml#L18-L59
The Visual Studio project this workflow builds has been set up to depend on a package from NuGet.org: https://github.com/OpenLoco/OpenLoco/blob/master/src/OpenLoco/openloco.vcxproj#L352-L354 https://www.nuget.org/packages/openloco.dependencies
This workflow worked fine until two days ago. However, we have noticed the package no longer gets retrieved properly, causing most (but not all) runs to fail.
Area for Triage:
C/C++
Question, Bug, or Feature?:
Bug
Virtual environments affected
Image version 20210321.1 (broken) 20210316.1 (fine)
Expected behavior
Successful run on image version 20210316.1, 3 days ago https://github.com/OpenLoco/OpenLoco/runs/2181845951
Actual behavior
Failed run on 20210321.1 today: https://github.com/OpenLoco/OpenLoco/runs/2204773635
Repro steps
Please look at https://github.com/OpenLoco/OpenLoco/actions