Open msrb opened 3 years ago
Hello, any update here? This makes the non-rawhide gating take days in some cases. I'd really love to get this fixed.
FWIW, openQA schedules tests differently, and in a way that happens to handle this (mainly because we wrote the scheduling code before the koji-build-group.build.complete
message existed).
We schedule on the bodhi.update.request.testing
messages (also on bodhi.update.edit
), and we have the scheduler figure out the list of NVRs in the update at the time of the message and pass those to the test system, which then downloads and tests those exact NVRs.
It would be easier if this info were available in the message, of course.
Further on this: the koji-build-group.build.complete
messages are different from all the other update-related because they were changed specifically to fit Fedora CI's requirements. See https://pagure.io/fedora-ci/general/issue/70 and https://github.com/fedora-infra/bodhi/pull/3629 .
The build-group.build.complete
messages for stable releases are sent, I believe, when an updates-testing push that includes the update is done, so the message indicates that the update should now available from the updates-testing repository. For Rawhide (and Branched pre-Beta freeze) the updates-testing repository is not enabled, so this doesn't make sense, which is probably why the message is sent when the update is created.
I've noticed recently that bodhi.update.request.testing
is not a perfect proxy for "update created" (which, see above, is how we treat it for openQA scheduling), because an update will not be submitted to testing
on creation if there is a gating policy applicable to the update that gates push-to-testing. In that case the update will only be submitted to testing once that policy is satisfied. If it isn't - e.g. if a required CI test fails or isn't run - the update will not be submitted to testing, so openQA will not run on it.
To fix this I'm planning to have openQA schedule on build-group.build.complete
messages as well as request.testing
, but in order to achieve this without a lot of messing around I actually need the build-group.build.complete
messages to look more like the other messages, specifically I need those messages to include the update
dict that all the other messages have, because that representation has info in it which the artifact
dict does not (like whether the update is critical path). I have sent a PR to do this.
Perhaps what we really both need, though, is an update.created
message, which includes both update
and artifact
dicts, and is always sent on update creation regardless of what release we're talking about? So test systems which can test the update without waiting for it to actually appear in updates-testing (which seems to be both of ours) can test it promptly upon creation.
Having both dicts is kinda ugly, since they're really just two slightly different sets of information about the same thing (the update, and what it contains). But including both likely requires the least change to both existing schedulers. Combining all the info both schedulers require in a single dict would be neater, but would require more change to the schedulers, most likely.
So I've been thinking about this area a lot today, and made various false starts, but I think I have a plan that would work for everyone: https://pagure.io/fedora-ci/general/issue/436#comment-872389 . Thoughts welcome.
Rawhide gating is pretty straight forward: Bodhi sends bodhi.update.status.testing.koji-build-group.build.complete messages when the update is created in Bodhi (or at least that's how it looks like from the outside). CI can then trigger on the
testing.koji-build-group.build.complete
messages and test Rawhide builds. All good here.For all non-Rawhide releases, the process is different. The
testing.koji-build-group.build.complete
messages are still being sent, but not immediately when the update is created. Is this intentional? It can take hours before the message is sent.It looks like for non-Rawhide updates, Bodhi sends org.fedoraproject.prod.bodhi.update.request.testing messages when the update is created. The problem with
bodhi.update.request.testing
messages is that there is no Koji task id for builds listed inside the update. Some CI systems rely on task ids for testing.I see 2 options here: 1) start sending
testing.koji-build-group.build.complete
immediately when a non-Rawhide update is created. 2) add Koji task id tobodhi.update.request.testing
messages so CI systems can test the update easily (note the Koji task id is later present in thetesting.koji-build-group.build.complete
message, so I assume Bodhi knows the id (?))WDYT?
Thanks :wink: