Better messaging for non-Rawhide releases

msrb commented 3 years ago

Rawhide gating is pretty straight forward: Bodhi sends bodhi.update.status.testing.koji-build-group.build.complete messages when the update is created in Bodhi (or at least that's how it looks like from the outside). CI can then trigger on the testing.koji-build-group.build.complete messages and test Rawhide builds. All good here.

For all non-Rawhide releases, the process is different. The testing.koji-build-group.build.complete messages are still being sent, but not immediately when the update is created. Is this intentional? It can take hours before the message is sent.

It looks like for non-Rawhide updates, Bodhi sends org.fedoraproject.prod.bodhi.update.request.testing messages when the update is created. The problem with bodhi.update.request.testing messages is that there is no Koji task id for builds listed inside the update. Some CI systems rely on task ids for testing.

I see 2 options here: 1) start sending testing.koji-build-group.build.complete immediately when a non-Rawhide update is created. 2) add Koji task id to bodhi.update.request.testing messages so CI systems can test the update easily (note the Koji task id is later present in the testing.koji-build-group.build.complete message, so I assume Bodhi knows the id (?))

WDYT?

Thanks :wink:

rh-mcermak commented 3 years ago

Hello, any update here? This makes the non-rawhide gating take days in some cases. I'd really love to get this fixed.

AdamWill commented 2 years ago

FWIW, openQA schedules tests differently, and in a way that happens to handle this (mainly because we wrote the scheduling code before the koji-build-group.build.complete message existed).

We schedule on the bodhi.update.request.testing messages (also on bodhi.update.edit), and we have the scheduler figure out the list of NVRs in the update at the time of the message and pass those to the test system, which then downloads and tests those exact NVRs.

It would be easier if this info were available in the message, of course.

AdamWill commented 2 years ago

Further on this: the koji-build-group.build.complete messages are different from all the other update-related because they were changed specifically to fit Fedora CI's requirements. See https://pagure.io/fedora-ci/general/issue/70 and https://github.com/fedora-infra/bodhi/pull/3629 .

The build-group.build.complete messages for stable releases are sent, I believe, when an updates-testing push that includes the update is done, so the message indicates that the update should now available from the updates-testing repository. For Rawhide (and Branched pre-Beta freeze) the updates-testing repository is not enabled, so this doesn't make sense, which is probably why the message is sent when the update is created.

I've noticed recently that bodhi.update.request.testing is not a perfect proxy for "update created" (which, see above, is how we treat it for openQA scheduling), because an update will not be submitted to testing on creation if there is a gating policy applicable to the update that gates push-to-testing. In that case the update will only be submitted to testing once that policy is satisfied. If it isn't - e.g. if a required CI test fails or isn't run - the update will not be submitted to testing, so openQA will not run on it.

To fix this I'm planning to have openQA schedule on build-group.build.complete messages as well as request.testing, but in order to achieve this without a lot of messing around I actually need the build-group.build.complete messages to look more like the other messages, specifically I need those messages to include the update dict that all the other messages have, because that representation has info in it which the artifact dict does not (like whether the update is critical path). I have sent a PR to do this.

Perhaps what we really both need, though, is an update.created message, which includes both update and artifact dicts, and is always sent on update creation regardless of what release we're talking about? So test systems which can test the update without waiting for it to actually appear in updates-testing (which seems to be both of ours) can test it promptly upon creation.

Having both dicts is kinda ugly, since they're really just two slightly different sets of information about the same thing (the update, and what it contains). But including both likely requires the least change to both existing schedulers. Combining all the info both schedulers require in a single dict would be neater, but would require more change to the schedulers, most likely.

AdamWill commented 1 year ago

So I've been thinking about this area a lot today, and made various false starts, but I think I have a plan that would work for everyone: https://pagure.io/fedora-ci/general/issue/436#comment-872389 . Thoughts welcome.

fedora-infra / bodhi

Better messaging for non-Rawhide releases #4187