Closed adamfarley closed 5 days ago
build repo release branches don't have mandatory PR review, probably as settings regex does not match...?
build repo code freeze check for the release branch was not enabled, but then I thought, do we really need it, especially if we get the release branch mandatory review fixed?
Currently dryrun tags are the tag previous to the suspected actual GA tag, since it's not easy to "reset" the auto-trigger, maybe we ought to fix that...?
fyi, a bit naff!, but to do a trigger "reset" (since I had to do one for a failed dryrun trigger!) As a Jenkins "Admin":
println "rm /home/jenkins/workspace/build-scripts/utils/releaseTrigger_jdk23/workspace/tracking".execute().text
getTestDependency was failing on temurin-compliance due to no authentication: https://github.com/adoptium/aqa-tests/issues/5589 This was failing in the July release as well, but failure of this stage does not fail the job.. which means we use the workspace cache, if we have one, and whatever maybe there!
re: https://github.com/adoptium/temurin/issues/54#issuecomment-2344011663
This was failing in the July release as well, but failure of this stage does not fail the job.. which means we use the workspace cache, if we have one
Do not think there is anything in the dependencies list that gets used by the TC jobs (but could affect if we are using TC Grinder to verify AQAvit tests, though most dependencies do not change often, so cached versions are fine).
TRSS needs new JDK versions adding before release week, release-openjdk23-pipeline was missing.
SL/Sept12 - now added
We should be more accurate with our release process terminology: Publish updates to the containers to dockerhub should be: Publish docker images to dockerhub
When doing the triage, the tap files of the grinder should be attached to the triage issue , for example https://github.com/adoptium/aqa-tests/issues/5598. So the job https://ci.adoptium.net/view/Test_grinder/job/TAP_Collection can collect tap files of pipeline job and tap files of grinder.
For trss if rerun job passes the corresponding test job status should be set as pass, so no need to do the extra triage. For example https://trss.adoptium.net/resultSummary?parentId=66e2f744d24e1b006e88e097 aarch64_mac, extended.openjdk rerun passed, the extended.openjdk should set as success.
@adamfarley says: This issue has been raised here.
AQA triage, using the auto generated rerun links of rerun test job, which has already prepopulated either failed test targets or failed test cases. https://ci.adoptium.net/job/Test_openjdk23_hs_extended.openjdk_x86-64_windows_rerun/19/
For trss if rerun job passes the corresponding test job status should be set as pass, so no need to do the extra triage. For example https://trss.adoptium.net/resultSummary?parentId=66e2f744d24e1b006e88e097 aarch64_mac, extended.openjdk rerun passed, the extended.openjdk should set as success.
Quick checks to make when triaging, look at the rerun.tap file on the Jenkins job, if its green, nothing to do.
We should also have a different chiclet icon for this "state" where rerun job passes. Suggest a yellow chiclet with a small green circle in top right corner for that state and so forth. Related issue: https://github.com/adoptium/aqa-test-tools/issues/912
There are almost no tests jobs were triggered by openjdk-pipeline or evaluation-openjdk-pipeline during September release ( i.e, ea build triggered nightly or weekly). As we set around 10 days before and 5 days after release as the no nightly tests job window. https://github.com/adoptium/ci-jenkins-pipelines/blob/master/pipelines/build/common/trigger_beta_build.groovy#L53-L79, which might be fine with January, March, July and September releases. May not be good for October and April releases.
Due to the scheduling of releases in September and October, as well as in March and April, there is a potential overlap that could result in gaps in testing. Specifically, with releases in March and September, followed closely by April and October, there may be minimal time available for comprehensive testing between those consecutive releases. As a result, critical tests may be rushed or omitted, impacting the stability of those releases. For example, reproducible comparing tests on linux are updated in Sep 6th and after that the test was only run once with jdk24 by Oct2.
There are almost no tests jobs were triggered by openjdk-pipeline or evaluation-openjdk-pipeline during September release ( i.e, ea build triggered nightly or weekly). As we set around 10 days before and 5 days after release as the no nightly tests job window. https://github.com/adoptium/ci-jenkins-pipelines/blob/master/pipelines/build/common/trigger_beta_build.groovy#L53-L79, which might be fine with January, March, July and September releases. May not be good for October and April releases.
Due to the scheduling of releases in September and October, as well as in March and April, there is a potential overlap that could result in gaps in testing. Specifically, with releases in March and September, followed closely by April and October, there may be minimal time available for comprehensive testing between those consecutive releases. As a result, critical tests may be rushed or omitted, impacting the stability of those releases. For example, reproducible comparing tests on linux are updated in Sep 6th and after that the test was only run once with jdk24 by Oct2.
To add some extra info, for example jdk-21.0.5+7 and +8 EA builds both landed during the Sept release "disabled test" period, jdk-21.0.5+6 EA was the last build run with tests prior to release, and jdk-21.0.5+9 after:
October release
October: Care needs taking when publishing binaries to check if a platform was rebuilt, for example both jdk17 macAarch64 and jdk17 pLinux were rebuilt, but binaries were still present on the original pipeline. Mac was initially published from the wrong one.
Can we remove bad build artifacts? when we rebuild...
October, we forgot to publish JDK11 aarch64 mac even though it had been finished for several days
status by platform document https://github.com/adoptium/temurin/issues/60 is not always being updated... I think we need to automate this, it's too easy to forget or update wrongly
misstakes were made in selecting publish job links, meaning a platform didn't get published when we said it was, due to clicking on WindowsX64 rather than Windowsx32...
aarch64 windows was added as a platform for jdk21 and jdk23, but there were several changes required for it to be ready.
This could have happened well ahead of the release period (as per the plan discussed in past PMC mtg), it could have also been seen during a dry run, but no dry run was performed (were other checklist items not completed, seemed the release champion was not always present and in that event missed the opportunity to communicate that to others and ensure tasks were delegated).
We need to invest resource in making the Installers publishing a lot better and automated. In its current form it mentally scars you !!
https://github.com/adoptium/aqa-tests/issues/5692#issuecomment-2429722283
Some arm32 jdk8 tests used to work on non-containers agents. Seems we don't have them any more https://ci.adoptium.net/label/ci.role.test&&sw.os.linux&&hw.arch.aarch32/. If the tests can only pass on non-containers we might need to do a vendor exclude due to our eclipse machine farm having limitations. https://github.com/adoptium/aqa-tests/blob/master/openjdk/excludes/vendors/eclipse/ProblemList_openjdk8.txt
I think this release has demonstrated the necessity of a dry-run, but also the issue with the "installers" and the new Azure VMs demonstrates the need for a dry-run installers upload possibly?
NOTE: Proposal to move the releasing guide to a wiki in either the build or one of the top level repositories: https://github.com/adoptium/temurin-build/pull/3993
Memo to self: Discuss deadlock potential with x64 nodes during installer process.
Raise issues (@adamfarley)
Other actions: Andrew:
Someone:
Other data:
Next retrospective - https://github.com/adoptium/temurin/issues/64
Summary
A retrospective for all efforts surrounding the titular releases.
All community members are welcome to contribute to the agenda via comments below.
This will be a virtual meeting after the release, with at least a week of notice in the #release Slack channel.
On the day of the meeting we'll review the agenda and add a list of actions at the end.
Invited: Everyone.
Time, Date, and URL
Time: 3-4pm UTC Date: Monday the 18th of November URL: https://meet.google.com/uwc-iwjn-rqm
Details
Retrospective Owner Tasks (in order):
TLDR
Add proposed agenda items as comments below.