Closed birdie-github closed 1 year ago
Thank you for the report @birdie-github, there is a lot of running builds so it looks fine to me.
I think somebody just submitted 3k builds at once, you can see the details here https://copr.fedorainfracloud.org/status/pending/all/
Even though there is a large queue, users shouldn't experience any delays when submitting builds.
@FrostyX
I had a 20 minute delay trying to build my project which takes under a minute to build.
This looks like a perfect DDOS attack.
This looks like a perfect DDOS attack
You are right. But people are doing mass rebuilds in Copr https://docs.pagure.org/copr.copr/user_documentation.html#mass-rebuilds
Basically, some people are rebuilding all Fedora packages with some new compiler setting, or rebuilding all python packages using different macros, etc.
So it likely isn't an intended DDoS attack
I had a 20 minute delay trying to build my project which takes under a minute to build.
But you should not be affected by such rebuilds. It looks like we need to improve our build queue processing. Did it take long for all chroots or only some architectures?
@birdie-github thank you for reporting this, but I agree with @FrostyX that this is rather a common use-case of Fedora Copr. Build processing should be relatively fair even if other users produce large queues. Admittedly there are issues with ppc64le because of #2869, so you might be affected.
Did it take long for all chroots or only some architectures?
I had a single arch for a single Fedora release. Basically a single build.
that this is rather a common use-case of Fedora Copr.
OK, then, I'll now know that due to some mass rebuilds I need to wait.
Or maybe you could at least add a note, something like "COPR is overloaded right now, build times might be affected".
Or maybe you could improve queuing and don't allow people to start 500 jobs simultaneously.
Anyways, looks like it's just fine. Closing then.
I had a single arch for a single Fedora release. Basically a single build.
Thank you for the build ID, looking at this log you are right that there is probably some problem.
OK, then, I'll now know that due to some mass rebuilds I need to wait.
Not really. This is not what typically happens. We can have even >= 300 build machines, and one user can only allocate up to 45 of them.
Or maybe you could improve queuing and don't allow people to start 500 jobs simultaneously.
That's exactly what is being done, lemme check.
The x86 machines started on our hypervisors (up to 80 I think) are having issues with this playbook task:
TASK [Activate Red Hat Subscription] *******************************************
Tuesday 22 August 2023 06:28:10 +0000 (0:00:00.044) 0:00:26.125 ********
The playbook timeouts on this, and we start over again with a new VM.
The playbook timeouts on this, and we start over again with a new VM.
The reason for this was some dual IP stack networking problem, not sure exactly (a similar thing was happening with p08 boxes recently in #2869 actually). Reboot helped here.
For Fedora Copr team, I created a convenient reboot trigger:
$ sudo rbac-playbook groups/copr-hypervisor.yml -l '*x86*' -t trigger_reboot
@birdie-github once again, thank you very much for reporting this!
@praiskup can you please document sudo rbac-playbook groups/copr-hypervisor.yml -l '*x86*' -t trigger_reboot
command?
Documented in #2969