Discuss capturing release note worthy OpenJDK issues during a release

tellison commented 2 years ago

Temurin releases contain noteworthy changes from OpenJDK and Temurin projects. While we will capture Temurin project changes ourselves as part of ongoing development (e.g. consider tagged issues), the OpenJDK project uses JIRA for tracking such noteworthy changes.

Aleksey Shipilëv is already capturing these in his backports monitor and shares the code containing the required JIRA query to extract these correctly.

For the purpose of documenting Temurin releases fully, we should be capturing the results of that JIRA query in our releases directory (example) alongside the binaries, etc (or part of the SBOM?) etc. so that we know what went into that build.

Opened as a discussion as not sure that we should be pulling directly from Aleksey's site, but also don't wish to diverge/duplicate effort in this space; so maybe a local fork copy and ensuring it is structured to allow for only extracting the rel notes part in a suitable format. Other ideas?

zdtsw commented 2 years ago

I am just curious, why we have two different git repo:s for each jdk version. e.g https://github.com/adoptium/jdk18u and https://github.com/adoptium/temurin18-binaries I understand the later one is purely used as storage for nightly and official releases. But any problem if we push binaries into the former one, then we have both source code and binaries in the same repo. Here we can use GH "Generate Release Notes" function with two tags (*ga) for official release (might even for pre-release nightly build) Surely, a list of all git commits from source code is not as nice as query JIRA. But if JIRA only show "fixed version/s" on jdk version (e.g 8,17,18) level, not to CPU/PSU level, then we will have a release notes with all JIRA info even it is a CPU

tellison commented 2 years ago

I am just curious, why we have two different git repo:s for each jdk version. e.g https://github.com/adoptium/jdk18u and https://github.com/adoptium/temurin18-binaries I understand the later one is purely used as storage for nightly and official releases. But any problem if we push binaries into the former one, then we have both source code and binaries in the same repo.

Somewhat historical, as the original OpenJDK code was all based in Mercurial repositories, and we were mirroring it into GitHub to better integrate into our build/test/distribute processes. OpenJDK have moved many repos into GitHub now, but having a plain mirror independent of the binaries distribution repo is still a handy distinction. Hopefully we won't need to recreate the mirrors again now.

Here we can use GH "Generate Release Notes" function with two tags (*ga) for official release (might even for pre-release nightly build) Surely, a list of all git commits from source code is not as nice as query JIRA. But if JIRA only show "fixed version/s" on jdk version (e.g 8,17,18) level, not to CPU/PSU level, then we will have a release notes with all JIRA info even it is a CPU

Producing release notes from JIRA requires just a little more logic than a list of the GitHub commits or all fixed JIRAs, as shown by Aleksey's code, so yes we don't want to just pick up everything tagged by major fix version.

Aleksey's code is capable of outputting the selected issues summary in text and html format - we'd probably want to capture them in json so they can be rendered on the website or queried as release notes.

BethGriggs commented 1 year ago

Based on https://github.com/adoptium/website-v2/pull/1029#issuecomment-1253353724, I've tried to capture a proposed flow of where release notes should be generated, and where the output should be published, and how the website will fetch the data to render.

flowchart TD
    OpenJDK_Version[/OpenJDK Version/] --> job[Release Notes Jenkins Job]
    job --> | Queries | CVE_data[(CVE Data)] --> job
    job --> | Queries | Notes_data[(Release Notes Data)] --> job
    job[Generate Release Notes] --> | Publishes | GitHub    
    GitHub[(GitHub Release Assets)]
    Website[Adoptium Website] --> | Fetches | GitHub

Open questions:

Is storing the release-notes as a GitHub asset a reasonable approach?
What format(s) do we want the raw/intermediate format to be published in?
- Thoughts: JSON makes it easy to parse/render on the website, txt makes it human-readable, and therefore useful, as standalone asset.
What is the best source of the Release Notes Data?
- OpenJDK Jira via a REST Query (Prototyped in https://github.com/adoptium/website-v2/issues/103)
- The output of running Aleksey's tool.
  - The tool currently outputs release notes in only .txt and .html formats. We could extend it to output JSON if that's the intermediate format we're aiming for.
Where can be obtain CVE information for a given OpenJDK release?
- As we'll always be generating release notes after the upstream release has gone out, we should be able to find a public source. If not, we may need to find a way of manually supplying which CVEs are known to be fixed in a given release, and ensure it's aggregated into the release notes in a consistent format.

jiekang commented 1 year ago

I think well-formed JSON is similarly human-readable as txt, so I'd +1 that for the intermediate data

tellison commented 1 year ago

Let me add my 2c to the open questions:

Is storing the release-notes as a GitHub asset a reasonable approach?

Yes, I think this is a reasonable approach. Release notes are a "deliverable asset" from Adoptium alongside the binary, SBOM, signatures, etc. The temurin-build folks can help to produce that asset.

What format(s) do we want the raw/intermediate format to be published in?

Thoughts: JSON makes it easy to parse/render on the website, txt makes it human-readable, and therefore useful, as standalone asset.

The raw assets should be primarily machine readable, and as a bonus readable by humans - so json fits the bill. These assets are designed to be consumed by multiple "clients" (not just humans), one of which is the website that converts the release notes asset into a human-readable format. Others include scripts and parsers that use release note information for analysis and other tasks.

What is the best source of the Release Notes Data?

OpenJDK Jira via a REST Query (Prototyped in Add Release Notes page website-v2#103)

The output of running Aleksey's tool.

OpenJDK Jira is the definitive data source, and so we should use that. The prototype is helpful and should be used as a basis for an explicit extraction task (script) run as part of the build process for each retained build set.

The tool currently outputs release notes in only .txt and .html formats. We could extend it to output JSON if that's the intermediate format we're aiming for.

Not necessary, just pull straight from OpenJDK Jira. Aleksey's tool is an example we can look at, but not depend upon directly.

Where can be obtain CVE information for a given OpenJDK release?

As we'll always be generating release notes after the upstream release has gone out, we should be able to find a public source. If not, we may need to find a way of manually supplying which CVEs are known to be fixed in a given release, and ensure it's aggregated into the release notes in a consistent format.

I think that is an open question. IIUC, CVE info requires a login to Jira, and we don't have a suitable login for an extraction tool to use. I would start by looking at the manual process temporarily while we have that discussion about the safe way to get CVE information. @jerboaa ?

jerboaa commented 1 year ago

Where can be obtain CVE information for a given OpenJDK release?

As we'll always be generating release notes after the upstream release has gone out, we should be able to find a public source. If not, we may need to find a way of manually supplying which CVEs are known to be fixed in a given release, and ensure it's aggregated into the release notes in a consistent format.

I think that is an open question. IIUC, CVE info requires a login to Jira, and we don't have a suitable login for an extraction tool to use. I would start by looking at the manual process temporarily while we have that discussion about the safe way to get CVE information. @jerboaa ?

https://openjdk.org/groups/vulnerability/advisories/ which get published via vulnerability-announce@openjdk.org for every critical patch update would be a starting point. Perhaps it would be worth discussing making this info more machine readable. This would probably need to get discussed within the vulnerability group.

gnu-andrew commented 1 year ago

As far as I'm aware, there is no CVE information in the OpenJDK JIRA, at least not publicly. All the security issues are kept private to Oracle.

We can propose providing the information from the vulnerability group in a machine-readable format. I doubt it will be possible for the upcoming release in two weeks though. If you have an example of what you would want to parse, that would help a lot.

BethGriggs commented 1 year ago

I wrote a very small script in https://github.com/BethGriggs/release-notes-prototype to attempt to pull out the JSON we need. Also pushed an early example of the JSON output.

I think we need agreement on what properties/fields we need to capture. So far I have the following:

 {
    "id": "JDK-8278472",
    "title": "Invalid value set to CANDIDATEFORM structure",
    "description": "According to the Windows API reference[1], dwStyle of CANDIDATEFORM structure should be set to CFS_CANDIDATEPOS or CFS_EXCLUDE. So, CFS_POINT is wrong here.\r\n  \r\nSee line 3914 in src\\java.desktop\\windows\\native\\libawt\\windows\\awt_Component.cpp [2], AwtComponent::SetCandidateWindow function:\r\n        CANDIDATEFORM cf;\r\n        cf.dwStyle = CFS_POINT;\r\n        ImmGetCandidateWindow(hIMC, 0, &cf);\r\n\r\n[1] https://docs.microsoft.com/en-us/windows/win32/api/imm/ns-imm-candidateform\r\n[2] https://github.com/openjdk/jdk/blob/f90425a1cbbc686045c87086af586e62f05f6c49/src/java.desktop/windows/native/libawt/windows/awt_Component.cpp#L3914",
    "priority": "3",
    "component": "client-libs",
    "subcomponent": "client-libs/java.awt:i18n",
    "link": "https://bugs.openjdk.java.net/browse/JDK-8278472",
    "type": "Bug"
  },

Here's an example REST API call%20AND%20(resolution%20not%20in%20(%22Won%27t%20Fix%22%2C%20%22Duplicate%22%2C%20%22Cannot%20Reproduce%22%2C%20%22Not%20an%20Issue%22%2C%20%22Withdrawn%22))%20AND%20(labels%20not%20in%20(release-note%2C%20testbug%2C%20openjdk-na%2C%20testbug)%20OR%20labels%20is%20EMPTY)%20AND%20(summary%20!~%20%22testbug%22)%20AND%20(summary%20!~%20%22problemlist%22)%20AND%20(summary%20!~%20%22problem%20list%22)%20AND%20(summary%20!~%20%22release%20note%22)%20AND%20(issuetype%20!%3D%20CSR)%20AND%20fixVersion%3D11.0.16&maxResults=1) which demonstrates all the data we get, per issue, from the API I am using.

(The script is a few lines of JS currently because that's the easiest language for me to get my thoughts out. It might be able to be converted to Bash + jq - if that is considered more maintainable for this project.)

BethGriggs commented 1 year ago

Update

I now have two scripts in https://github.com/BethGriggs/release-notes-prototype (see documentation in that repository)

fetchCommitList traverses the Git history between two tags, pulls out the JDK numbers and commit hashes and writes the output to a commits.json file.

fetchReleaseNotes uses the commits.json, fetches additional info from Jira, and writes a named release-notes.json file.

Release notes are currently in this form:

[
{
"id": "JDK-8294333",
"title": "(tz) Update Timezone Data to 2022c",
"priority": "3",
"component": "core-libs",
"subcomponent": "core-libs/java.time",
"link": "https://bugs.openjdk.java.net/browse/JDK-8294333",
"type": "Backport",
"backportOf": "JDK-8292579"
},
...
]

For restricted/non-public JDK issues we just include the commit message as the title and the JDK number. I have also run a number of tests/comparisons with other release note sources to gain confidence that the script output is valid.

Next steps

[ ] Determine where the release note scripts should live in the Adoptium GitHub.
[ ] Create a Jenkins job with the appropriate parameters to pass to the scripts.
- [ ] Also determine where the job fits in the pipeline.
[ ] In the Jenkins script, add a step which persists the release-notes.json file as a GitHub asset for the release.
[ ] Update https://github.com/adoptium/website-v2/pull/1029 to pull from the GitHub asset.
[ ] [extras] Minor script tweaks, add test cases, etc.

I'm on vacation until Thursday next week, but I have given @sxa write access to the repository while it's still on my account in case any tweaks are necessary in my absence.

smlambert commented 1 year ago

A good candidate location for the release notes script is the https://github.com/adoptium/github-release-scripts/ repository. (related: https://github.com/adoptium/github-release-scripts/issues/92).

sxa commented 1 year ago

A good candidate location for the release notes script is the https://github.com/adoptium/github-release-scripts/ repository. (related: adoptium/github-release-scripts#92).

Yep that's my preferred location for this tool too.

smlambert commented 1 year ago

One other possible location would be in this repository as part of the Temurin build.

Why? Its a build artifact and other artifacts like SBOM gets produced as part of the build, so perhaps creation of release notes should occur at this stage (for 1 primary platform like x64 linux (as we do not need separate notes for separate platforms, so can be considered in the same way we do for source.zip with one copy archived as part of the overall pipeline). 2 other reasons to consider putting the scripts here and generating the notes at the compile stage:

makes the creation of release notes more 'portable' to other CI systems (if we build Temurin using the build-jdk Github action)
if we eventually want to amend our SBOM to refer to a VEX (Vulnerability Exchange format) extension (which I am going to suggest we will want to do), it may be easier to do so when both release notes and SBOM generation occur at the same time (TBD)

BethGriggs commented 1 year ago

An update of where this is at:

Scripts exist to generate release notes based on Git tags.
- https://github.com/BethGriggs/release-notes-prototype/
Jenkins job exists that runs the scripts and saves the JSON output as an artefact of the build.
- https://ci.adoptopenjdk.net/view/work-in-progress/job/create_release_notes/
A GitHub asset for 19.0.2 has manually been published for testing/evaluation.
- https://github.com/adoptium/temurin19-binaries/releases/tag/jdk-19.0.2%2B7.

Left to do:

[x] Move the scripts to an Adoptium repository
[x] Agree on the naming convention for the release notes assets.
- [x] Add a FILENAME parameter to the job so this can be set manually while we're still agreeing.
- OpenJDK-jdk-19.0.2-ga-release-notes.json is the current naming, OpenJDK19U-jdk-release-notes_19.0.2_7.json has been suggested.
[ ] Resolve the error in the upload release job so manual upload is not required.
- I cannot access this job - @sxa ran it for me and it hit an error.
[x] Extend the Adoptium API to expose an endpoint to pull down the release notes assets.
- @gdams mentioned @johnoliver may look at this.
[x] Update my existing website PR to prototype rendering them on the webpage.
- Initially, I'm going to try fetching the asset directly until the Adoptium API endpoint is available for it.

sxa commented 1 year ago

JSON files for the latest release (To be considered experimental and subject to change) are in the releases at

tellison commented 1 year ago

Reading the release notes, why do some entries have null values? Peering at the commits/issues I don't see a pattern, e.g. (from OpenJDK17U-jdk-release-notes_17.0.6_10.json)

  {
    "id": "JDK-8274296",
    "title": "8274296: Update or Problem List tests which may fail with uiScale=2 on macOS",
    "priority": null,
    "component": null,
    "subcomponent": null,
    "link": "https://bugs.openjdk.java.net/browse/JDK-8274296",
    "type": null,
    "backportOf": null
  },

BethGriggs commented 1 year ago

@tellison that case is now resolved by @gdams in https://github.com/BethGriggs/release-notes-prototype/commit/8fc1241375638c99fdbaeb49e96148ec2831f9e8 - the Jira query was omitting testbugs. Originally, that was intentional when we were just going to use the output of the Jira API. But, now we're treating the Git history as the source of truth, they're included in the release notes output. The aforementioned fix will mean we have the complete values from Jira now.

Security restricted JDK issues will have values of null as we cannot access the information from the Jira query.

We will need to regenerate and upload the release note assets if we want them to be updated. I've been working on a couple of other fixes that have been noticed too (so it might be worthwhile doing that too).

tellison commented 1 year ago

Got it, thanks @BethGriggs! I'll assume that gets fixed at some point.

I've got some comments on the website's rendering, but will open a website issue for that. I was going back to the raw data to see what was captured.

sxa commented 1 year ago

Follow up to earlier comment https://github.com/adoptium/temurin-build/issues/3044#issuecomment-1406878251 the release noted for 18.0.2.1+1 are now up at https://github.com/adoptium/temurin18-binaries/releases/download/jdk-18.0.2.1%2B1/OpenJDK18U-jdk-release-notes_18.0.2.1_1.json (Still awaiting a cache refresh on the API for it to be visible at https://adoptium.net/temurin/release-notes/?version=jdk-18.0.2.1+1

adoptium / temurin-build

Discuss capturing release note worthy OpenJDK issues during a release #3044

Update

Next steps