Closed riarenas closed 4 years ago
Took me a while to track the exact configuration for NuGet.config (it's not documented yet). Updated the issue description with it:
<config>
<add key='maxHttpRequestsPerSource' value='2' />
</config>
According to AzDO telemetry a large number of our AzDO buildpool machines are being detected as being the same machine, which makes it so they all get into the same throttling bucket, making things a lot worse for our BYOC pools than it is for hosted.
As a short-term solution, the concurrency limits for the dnceng instance has been increased and it seems to have helped somewhat.
The Azure DevOps team has found that we are now hitting some other throttling limits, and are in the process of investigating.
The AzDO team is also considering some long-term solutions like filling the gaps that are blocking us from adopting upstream feeds, which would reduce the number of feeds we need to specify in our repos' NuGet.config, and looking at improvements with the NuGet team.
Runtime-coreclr outerloop hit this multiple times yesterday.
https://dev.azure.com/dnceng/public/_build/results?buildId=447821 https://dev.azure.com/dnceng/public/_build/results?buildId=447950 https://dev.azure.com/dnceng/public/_build/results?buildId=448056 https://dev.azure.com/dnceng/public/_build/results?buildId=448358 https://dev.azure.com/dnceng/public/_build/results?buildId=448386 https://dev.azure.com/dnceng/public/_build/results?buildId=448795
@riarenas - Is this still an thing?
Yes. We have had some quota increases to help with this in the short term, but we haven't heard back with a long term solution.
We've received additional reports of 429s during restore operations in attempt 1 of these builds:
https://dev.azure.com/dnceng/internal/_build/results?buildId=587047&_a=summary https://dev.azure.com/dnceng/internal/_build/results?buildId=587046&_a=summary https://dev.azure.com/dnceng/internal/_build%2Fresults?buildId=587045&_a=summary
I reached out in the thread we had with the azure artifacts group about throttling.
CC @wtgodbe
The Azure Artifacts team said the problems on 4/3 were due to dnceng using 60% of the traffic for their scale units when they were completely scaled down. Additionally, yesterday we saw IP throttling come back in a lot more cases:
Using runfo I was able to find these from runtime, but we have additional reports from Roslyn, where the error was reported as a timeout instead of a build failure error.
I asked the artifacts team for increased quota as we're ramping up usage of the feeds.
@riarenas thanks for looking into this, do we have an ETA?
would it make sense to bring back the dotnet blob feed back as a restore source in the meantime?
No ETA.
I'll create PRs to re-add dotnet-core as a backup if we don't hear from them soon.
Thanks @riarenas
AzDO folks have increased our limits. I'll keep this in FR for a bit to see if this gives us some relief, and move it back to general tracking afterwards, as our feed usage is only going to increase in the near term. (we haven't moved ASPNet or Installer to relying entirely on azdo feeds yet)
The new limits seem to have stuck. Haven't seen any more throttling during restore since the limits were raised. I'll remove this from FR.
The AzDO team said they are evaluating more sustainable options to handle our load. I'll keep this open because I think if we onboarded another big repo to only using azdo feeds, we'd start seeing this again.
We have reached out to the azure artifacts team again for options, https://github.com/dotnet/core-eng/issues/9681
Ok - AzDO is saying they've fixed it (for real) now.
After the recent AzDO changes it doesn't look like we'll be easily throttled again, so I don't think there's much worth in keeping this long standing issue open anymore. We can open new issues for any sporadic throttling we see.
We have started seeing some throttling errors when attempting to restore NuGet packages from Azure Artifacts that manifest like this:
This does not appear to be causing widespread failures so far, but as we increase our reliance on these feeds, we're starting to get more reports.
Example builds where this has been seen: https://dev.azure.com/dnceng/public/_build/results?buildId=398862 https://dnceng.visualstudio.com/internal/_build/results?buildId=384184
All unauthenticated NuGet requests are getting thrown into the same throttling bucket by AzDO, and due to the multi-feed lookup for each package that NuGet does, depending on how many feeds you have in your NuGet config, and how many packages you need to restore, the problem gets worse.
The short term suggestions from the AzDO team are:
We are still in talks with the AzDO team as these workarounds will end up requiring a lot of changes to our infrastructure for this.