fedora-copr / copr

RPM build system - upstream for https://copr.fedorainfracloud.org/
115 stars 58 forks source link

Problems with spawning new builders caused by subscription-manager #3380

Closed FrostyX closed 1 day ago

FrostyX commented 3 weeks ago

There is something wrong with our subscription-manager. Spawning new builders fails with:

TASK [Check that we have successfully finished the subscription] ***************
Wednesday 21 August 2024  12:24:58 +0000 (0:00:03.538)       0:03:33.162 ******
FAILED - RETRYING: [2620:52:3:1:dead:beef:cafe:c208]: Check that we have successfully finished the subscription (100 retries left).
FAILED - RETRYING: [2620:52:3:1:dead:beef:cafe:c208]: Check that we have successfully finished the subscription (99 retries left).
FAILED - RETRYING: [2620:52:3:1:dead:beef:cafe:c208]: Check that we have successfully finished the subscription (98 retries left).
FAILED - RETRYING: [2620:52:3:1:dead:beef:cafe:c208]: Check that we have successfully finished the subscription (97 retries left).
FAILED - RETRYING: [2620:52:3:1:dead:beef:cafe:c208]: Check that we have successfully finished the subscription (96 retries left).
FAILED - RETRYING: [2620:52:3:1:dead:beef:cafe:c208]: Check that we have successfully finished the subscription (95 retries left).
FAILED - RETRYING: [2620:52:3:1:dead:beef:cafe:c208]: Check that we have successfully finished the subscription (94 retries left).
FAILED - RETRYING: [2620:52:3:1:dead:beef:cafe:c208]: Check that we have successfully finished the subscription (93 retries left).
...
xsuchy commented 3 weeks ago

The problem is that subscription-manager fails because

When I open subscription.rhsm.redhat.com I got error SEC_ERROR_UNKNOWN_ISSUER .

I reported it to chainsaw team (subscription manager team) and opened Red Hat IT ticket. Unfortunatelly it has very low priority (because only one reports). If you can get this resolved, then please open ticket at https://help.redhat.com - The more ticket will be open, the higher priority it will have.

ekohl commented 3 weeks ago

Normally the certificate is shipped in subscription-manager-rhsm-certificates. On CentOS Stream 9:

# dnf install subscription-manager-rhsm-certificates -yq
# rpm -qv subscription-manager-rhsm-certificates
subscription-manager-rhsm-certificates-20220623-1.el9.noarch
# curl --cacert /etc/rhsm/ca/redhat-uep.pem https://subscription.rhsm.redhat.com
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>openresty</center>
</body>
</html>

Perhaps there's an old version of subscription-manager-rhsm-certificates installed?

sbrivio-rh commented 3 weeks ago

ppc64le builders started now? (https://copr.fedorainfracloud.org/coprs/sbrivio/passt/build/7930789/)

praiskup commented 3 weeks ago

I think RHSM got fixed; we already have ~160 builders now.

pgnd commented 3 weeks ago

I think RHSM got fixed; we already have ~160 builders now.

Do we need to cancel & resubmit builds that are still stuck @ 'pending'? Or just wait until they're automatically picked up?

FrostyX commented 3 weeks ago

Or just wait until they're automatically picked up?

They should get picked up.

sbrivio-rh commented 3 weeks ago

My pending ones were picked up, or are being picked up (https://copr.fedorainfracloud.org/coprs/sbrivio/passt/build/7930789/).

pgnd commented 3 weeks ago

noted, thx. looks like a busy backlog -- yours submitted ~ 7hrs ago? mine's just

Status:
    pending - Build is waiting in queue for a backend worker. 
Submitted:
    2024-08-21 14:20 EDT (32 minutes ago) 

as usual, bad-timing on my part ;-)

sbrivio-rh commented 3 weeks ago

I guess the notice on Copr could be removed now...?

praiskup commented 3 weeks ago

No, actually the problem is back .... :-( the queue is growing now again.

xsuchy commented 3 weeks ago

We are seing two issues now: 1) subscription-manager takes long time to proceed (4 minutes). 2) Sometimes fails with internal server error response from server.

praiskup commented 3 weeks ago

These problems no longer occur, but we have no feedback from RH IT. :shrug:

praiskup commented 2 weeks ago

Reopening. The problem is back.

praiskup commented 2 weeks ago

And RHSM works again.

praiskup commented 2 weeks ago

And the problem is back again :-(

praiskup commented 1 day ago

This has been fixed on the RHSM side (for now at least); and from the info we got off-list these problems came as high/peak RHSM use (or DDoS). We were told to monitor status.redhat.com for the next time, and subscribe there to be informed about possible future outages.