Closed mizdebsk closed 4 years ago
This is caused by several factors combined together:
Fixing each single of the above issues should fix the whole problem. IMO the best long-term fix is to fix nr 1 (by adding rebuild_srpm option of BuildTask in Koji and then making Koschei submit scratch builds with rebuild_srpm=False).
Sounds good, I wonder if we couldn't also remove/not keep buildroots for koschei jobs?
Using the aforementioned day of Jan 27th as example, Koschei submitted 18502 scratch builds, including 661 scratch builds for python-debtcollector, all of which failed, probably all due to rebuildSRPM task failure. Example of such scratch build: https://koji.fedoraproject.org/koji/taskinfo?taskID=41111817 I will try to reproduce the issue as unit test.
Verified in staging as follows:
Reproduced the issue in staging Koschei:
Consuming message from topic org.fedoraproject.stg.buildsys.task.state.change (message id 0b0fdb3e-2e18-476b-925a-3586ac4a25d7)
Setting build Build(id=21576, package=python-debtcollector, collection=f32, state=running, task_id=90009534) state to failed
Deleting build Build(id=21576, package=python-debtcollector, collection=f32, state=running, task_id=90009534) because it has no repo_id
Successfully consumed message from topic org.fedoraproject.stg.buildsys.task.state.change (message id 0b0fdb3e-2e18-476b-925a-3586ac4a25d7)
Then I've deployed fixed version and retested. New build was submitted and failed: https://koji.stg.fedoraproject.org/koji/taskinfo?taskID=90009536 This time build was not removed from Koschei DB: https://koschei.stg.fedoraproject.org/build/21577
Consuming message from topic org.fedoraproject.stg.buildsys.task.state.change (message id 2c7dfc94-c8e0-416d-9d68-3044c7f8c494)
Setting build Build(id=21577, package=python-debtcollector, collection=f32, state=running, task_id=90009536) state to failed
Successfully consumed message from topic org.fedoraproject.stg.buildsys.task.state.change (message id 2c7dfc94-c8e0-416d-9d68-3044c7f8c494)
Therefore I consider the fix to be verified in staging.
Fix was deployed to production.
Originally reported by @nirik at https://pagure.io/koschei/issue/2