NuGet / NuGetGallery

NuGet Gallery is a package repository that powers https://www.nuget.org. Use this repo for reporting NuGet.org issues.
https://www.nuget.org/
Apache License 2.0
1.52k stars 643 forks source link

[NuGet.org Bug]: Face 504/502 errors while Restore Packages on AWS eu-centasl-1 region #9775

Closed omer-agranov closed 6 months ago

omer-agranov commented 6 months ago

Impact

I'm unable to use NuGet.org

Describe the bug

while using dotnet cli restore we started to get 502 and 504 errors from Nuget

for test we run curl with this url: https://api.nuget.org/v3-flatcontainer/awssdk.core/index.json

we run it from a machines in aws eu-central-1a and eu-cenbtral-1b and fails most of the times (not always) on eu-central-1c it worked every time we run the command

when we test it using curl command and it fails, instead of results we get the following message:

`

Our services aren't available right now

We're working to restore all services as soon as possible. Please check back soon.

20240111T122751Z-dp7f50w70h3shb2hut61khm0p400000003e000000000x5q3

`

Repro Steps

curl https://api.nuget.org/v3-flatcontainer/awssdk.core/index.json from AWS in regions eu-central-1a and eu-central-1b

repeat the same for region eu-centeral-1c and it works

Expected Behavior

should work on every region

Screenshots

No response

Additional Context and logs

No response

dhavalbkn commented 6 months ago

we are also facing same issue in aws region eu-south-1

Retrying 'FindPackagesByIdAsync' for source 'https://api.nuget.org/v3-flatcontainer/microsoft.extensions.configuration.json/index.json'.
  Response status code does not indicate success: 502 (Bad Gateway).
KUTlime commented 6 months ago

Same here. I see it from yesterday, but today, it fails all the time and none of my builds runs can finish. The release pipeline is stuck on this.

ghost commented 6 months ago

There are some recent similar complains here and here.

zhhyu commented 6 months ago

My sincere apologies for the inconvenience! Our primary CDN provider is having an outage, and we have switched the traffic to the other CDN. Please let us know if you see that the issue is still ongoing.

ghost commented 6 months ago

It's working for us right now.

cian-sheehy commented 6 months ago

Why wasn't the CDN degradation issue posted on status.nuget.com yesterday? Issues happen and things fall over but it would be great to see these updates/issues on the status page going forward

zhhyu commented 5 months ago

Why wasn't the CDN degradation issue posted on status.nuget.com yesterday? Issues happen and things fall over but it would be great to see these updates/issues on the status page going forward

Yes, this is not good. We have a monitoring gap there, and the failover should have been executed hours ago, when the outage of our dependency just started happening.

We will enhance the monitoring system and consider a better strategy to update the status page in an effective and more automatic way.

Thank you for sharing the concern!

cian-sheehy commented 5 months ago

Thank you @zhhyu. That's great to hear