eclipse-che / che

Kubernetes based Cloud Development Environments for Enterprise Teams
http://eclipse.org/che
Eclipse Public License 2.0
6.99k stars 1.19k forks source link

Improve Release Workflow #22258

Closed SDawley closed 9 months ago

SDawley commented 1 year ago

Is your task related to a problem? Please describe

I've noticed while running releases that sometimes a downstream action will fail, but the failure isn't registered so the release proceeds anyway, which causes issues in the dependent releases.

For example in the 7.68.0 release the first run completed successfully even though Release Che E2E Tests failed.

Since nothing depends on Che E2E maybe proceeding if that particular release fails is OK, but in the 7.67.0 release Che-Plugin-Registry failed, which ended up causing issues for dependent releases.

Describe the solution you'd like

Currently the make-release script checks for the container and the branch, but there are additional steps that could fail and cause the release to need to be re-run and the container potentially regenerated (which was the issue with plugin-registry) so I would like to investigate if theres a way to check that the downstream release action itself completed successfully.

Describe alternatives you've considered

No response

Additional context

No response

SDawley commented 1 year ago

@mkuznyetsov I did some investigating and it looks like https://github.com/marketplace/actions/wait-on-check might be something we can use?

The issue with the 7.67.0 release was that plugin-registry failed on the publish gh-pages command, then subsequently on the publish to npmjs command. Since the main release pipeline only check for images it continued the release and kicked off later stages. I had to rerelease plugin registry to fix the various publishes, so dependent stages ended up looking for the wrong image and things broke.

mkuznyetsov commented 1 year ago

@SDawley yes, or something like https://github.com/convictional/trigger-workflow-and-wait , that would make possible to configure all of the job triggering sequence in workflow.yaml file, as opposed to *.sh script like right now. Though we'd need to ensure how it will behave in non-standard usecases, like after rerunning individual failed releases for certain projects, the release orchestrator must properly determine which projects have been released (had successful workflow runs for 7.x.y version) and skip them.

che-bot commented 9 months ago

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.