COPR can basically be DDOS'ed but it's by design

fedora-copr / copr

RPM build system - upstream for https://copr.fedorainfracloud.org/

113 stars 61 forks source link

COPR can basically be DDOS'ed but it's by design #2883

Closed birdie-github closed 1 year ago

birdie-github commented 1 year ago

down

FrostyX commented 1 year ago

Thank you for the report @birdie-github, there is a lot of running builds so it looks fine to me.

I think somebody just submitted 3k builds at once, you can see the details here https://copr.fedorainfracloud.org/status/pending/all/

Even though there is a large queue, users shouldn't experience any delays when submitting builds.

birdie-github commented 1 year ago

@FrostyX

I had a 20 minute delay trying to build my project which takes under a minute to build.

This looks like a perfect DDOS attack.

FrostyX commented 1 year ago

This looks like a perfect DDOS attack

You are right. But people are doing mass rebuilds in Copr https://docs.pagure.org/copr.copr/user_documentation.html#mass-rebuilds

Basically, some people are rebuilding all Fedora packages with some new compiler setting, or rebuilding all python packages using different macros, etc.

So it likely isn't an intended DDoS attack

I had a 20 minute delay trying to build my project which takes under a minute to build.

But you should not be affected by such rebuilds. It looks like we need to improve our build queue processing. Did it take long for all chroots or only some architectures?

praiskup commented 1 year ago

@birdie-github thank you for reporting this, but I agree with @FrostyX that this is rather a common use-case of Fedora Copr. Build processing should be relatively fair even if other users produce large queues. Admittedly there are issues with ppc64le because of #2869, so you might be affected.

birdie-github commented 1 year ago

Did it take long for all chroots or only some architectures?

I had a single arch for a single Fedora release. Basically a single build.

birdie-github commented 1 year ago

that this is rather a common use-case of Fedora Copr.

OK, then, I'll now know that due to some mass rebuilds I need to wait.

Or maybe you could at least add a note, something like "COPR is overloaded right now, build times might be affected".

Or maybe you could improve queuing and don't allow people to start 500 jobs simultaneously.

Anyways, looks like it's just fine. Closing then.

praiskup commented 1 year ago

I had a single arch for a single Fedora release. Basically a single build.

Thank you for the build ID, looking at this log you are right that there is probably some problem.

OK, then, I'll now know that due to some mass rebuilds I need to wait.

Not really. This is not what typically happens. We can have even >= 300 build machines, and one user can only allocate up to 45 of them.

Or maybe you could improve queuing and don't allow people to start 500 jobs simultaneously.

That's exactly what is being done, lemme check.

praiskup commented 1 year ago

The x86 machines started on our hypervisors (up to 80 I think) are having issues with this playbook task:

TASK [Activate Red Hat Subscription] *******************************************
Tuesday 22 August 2023  06:28:10 +0000 (0:00:00.044)       0:00:26.125 ********

The playbook timeouts on this, and we start over again with a new VM.

praiskup commented 1 year ago

The playbook timeouts on this, and we start over again with a new VM.

The reason for this was some dual IP stack networking problem, not sure exactly (a similar thing was happening with p08 boxes recently in #2869 actually). Reboot helped here.

For Fedora Copr team, I created a convenient reboot trigger: $ sudo rbac-playbook groups/copr-hypervisor.yml -l '*x86*' -t trigger_reboot

praiskup commented 1 year ago

@birdie-github once again, thank you very much for reporting this!

nikromen commented 1 year ago

@praiskup can you please document sudo rbac-playbook groups/copr-hypervisor.yml -l '*x86*' -t trigger_reboot command?

praiskup commented 1 year ago

Documented in #2969