etcd-io / etcd

Distributed reliable key-value store for the most critical data of a distributed system
https://etcd.io
Apache License 2.0
47.69k stars 9.75k forks source link

Plans for v3.4.19 release #14105

Closed lavacat closed 2 years ago

lavacat commented 2 years ago

As per discussion during community meeting, this issue is to estimate potential work and timeline for 3.4.19 release.

3.4.18 release was on Oct 15, 2021. Here is the table of all commits from release-3.5 branch (except 'tests: ', 'scripts: ' and 'Merge pull request *') since Oct 6, 2021. I'm assuming that fixes were backported before that.

To proceed, let's agree on:

  1. What version of go to use for 3.4? I suggest 1.17 since 1.15 is EOL. We can keep go.1.12 in go.mod for compatibility. See [1] and [2]. @ahrtr @ptabor @lilic @hexfusion
  2. Priority of issues to be backported. Feel free to comment in the table above. @serathius @ahrtr

related discussions about go version: [1] https://github.com/etcd-io/etcd/issues/12840 [2] https://github.com/etcd-io/etcd/issues/13912

ahrtr commented 2 years ago

Thanks @lavacat. Please see my comments below.

Go version

1.17 is definitely not an option for release-3.4. One breaking change introduced in Golang 1.15 is the deprecation of the legacy behavior of treating the CommonName field on X.509 certificates as a host name when no Subject Alternative Names are present. The workaround is to add the value x509ignoreCN=0 to the GODEBUG environment variable to re-enable it again. see go1.15#commonname. Note that GODEBUG=x509ignoreCN=0 flag is removed in Golang 1.17. It means applications which are still use the legacy CommonName field in certificate will run into issue, unless they update their certificates.

So Golang 1.17 isn't accepted for 3.4. Instead, both Golang 1.15 & 1.16 are OK for 3.4. Since Golang supports N-2 versions (1.16, 1.17 and 1.18 for now), so Golang 1.16 is better than 1.15. We still need to make sure there is NO any impact on the existing applications which are using or depending on 3.4.

We also need to evaluate the effort to support Golang 1.16 for 3.4.

Making the pipeline green is the priority

Just I mentioned in issuecomment-1119280894, the top priority is to fix all the test failures and make the pipeline green. Please let's get this done before talking about the release plan for 3.4.19.

Backport

Again, fixing the pipeline issue is the top priority for now. With regarding to the backporting, I'd suggest to only backport security and major bug fixes. Of course, it's open to discuss.

lavacat commented 2 years ago

@ahrtr here is what I have right now https://github.com/etcd-io/etcd/pull/14134

Running into issues with integration tests in golang:1.16 docker container. Can't get a clean run even after removing tests that timeout. There is always another test that timeouts on next run.

ahrtr commented 2 years ago

Just as I mentioned in issues/14135, the 3.4 pipeline had never been green since its first day (Jun 24, 2021) being created. It's really a serious problem to me, so I just jumped in and spent about two whole days to get it resolved in pull/14136.

Although there are still some flaky test failures, but the pipeline can be green after about 1 ~ 2 retries. So we have a good start for now for 3.4 pipeline.

There are still lots of work to do before releasing 3.4.19. The rough plan is something like below,

Milestone 1: Stabilize the pipeline

Milestone 2: cherry pick PRs from 3.5 to 3.4

I will try to figure out a list later, the table provided by @lavacat is a good reference . The high level thought is we should only cherry pick bug fix and security changes. I think we might need to do milestone 2 and milestone 1 at the same time, because cherry picking some bug fixes may also can stabilize the pipeline.

Issues/PRs not required for 3.4.19 We should only backport security fix and major/critical bug fixes.

Milestone 3: release 3.4.19

Once we finish milestone 1 and milestone 2, then we can kick off the release of 3.4.19. It would be very helpful if other experienced maintainers can jump in here. cc @hexfusion who used to maintain the stable releases.


Please feel free to chime in if you think any PR/issues need to be investigated or included in 3.4.19. Please also feel free to let us know if anyone has any concerns or comments. cc @serathius @ptabor @spzala @hexfusion @lavacat @endocrimes.

I will cherry pick 14087 and 13932 to 3.4 sometime later. Anyone feels free to work on any item, just drop a message. The task Add some pipelines with RACE enabled is a priority, if nobody works on it in the following 1~2 weeks, then I may jump in to do it.

lavacat commented 2 years ago

FYI, added https://github.com/etcd-io/etcd/pull/14168 I've run into this flakiness locally

ahrtr commented 2 years ago

Thanks @lavacat , the test failure is fixed in 14151, which isn't merged yet.

lavacat commented 2 years ago

Item 2 o the list https://github.com/etcd-io/etcd/pull/14179

ahrtr commented 2 years ago

@lavacat would you have bandwidth and be interested in having a deep dive into 14158 and/or 14159 ? I ran into 14158 multiple times, but do not get time to have a deep dive so far.

lavacat commented 2 years ago

@ahrtr yes, will take a look tomorrow.

ahrtr commented 2 years ago

Talked to @serathius & @spzala , and also after second thought, I think we should only backport security fixes and major/critical bug fixes to 3.4.19, so I removed some items from the list. Please see the list in the milestone 2. Reasons:

  1. It isn't good to stay on 3.4 longer because it's 3 years old release. Users are recommended to upgrade to 3.5.4+ if they need any new features or all known bug fixes;
  2. We still need to support 3.4.x, otherwise we will break release maintenance promises. So we should still release 3.4.19 although users are recommended to upgrade to 3.5.4+;
  3. To minimize the impact, we should only backport security fixes and major/critical bug fixes to 3.4.19. It seems that there is no any major/critical bug on 3.4.18 so far, so we should only backport security fixes. For any bug fixes which have already been cherry picked to 3.4.19, let's keep it as it's. Please anyone feel free to feedback if you really need any PR to be cherry picked to 3.4.19.

@serathius @spzala @ptabor @hexfusion @mitake @dims and anyone please feel free to comment if you have concerns.

spzala commented 2 years ago

Talked to @serathius & @spzala , and also after second thought, I think we should only backport security fixes and major/critical bug fixes to 3.4.19, so I removed some items from the list. Please see the list in the milestone 2. Reasons:

1. It isn't good to stay on 3.4 longer because it's 3 years old release. Users are recommended to upgrade to 3.5.4+ if they need any new features or all known bug fixes;

2. We still need to support 3.4.x, otherwise we will break [release maintenance promises](https://github.com/etcd-io/etcd/issues/13912). So we should still release 3.4.19 although users are recommended to upgrade to 3.5.4+;

3. To minimize the impact, we should only backport security fixes and major/critical bug fixes to 3.4.19.  It seems that there is no any major/critical bug on 3.4.18 so far, so we should only backport security fixes. For any bug fixes which have already been cherry picked to 3.4.19, let's keep it as it's. Please anyone feel free to feedback if you really need any PR to be cherry picked to 3.4.19.

@serathius @spzala @ptabor @hexfusion @mitake @dims and anyone please feel free to comment if you have concerns.

@ahrtr @lavacat thanks for the discussion in this issue. I agree on both - 1) we should support 3.4.x and 2) going with needed fixes (e.g. security fixes or needed fixes requested by the Kubernetes project/other users as @ahrtr mentioned) in the 3.4.19. For new features/improvements, etcd users should try to move to the latest release.

ahrtr commented 2 years ago

All items included in milestone 1 and milestone 2 are basically done.

There are two unresolved issues in milestone 1, and @lavacat is still progress of investigation. But both of them should only be test issues, and the pipeline can be green after about 1~2 retries when running into the issues. So I don't think they are blockers. If @lavacat can get them resolved soon, then I am OK to merge the PR(s). @lavacat could you update on the issues?

I moved https://github.com/etcd-io/etcd/pull/13895 out of milestone 2, because it may have big impact, and etcd isn't subject to the CVE. Please see my comment in https://github.com/etcd-io/etcd/pull/14191#issuecomment-1178573559. So it should be safe. All items in milestone 2 are done.

@endocrimes have you finished the Jepsen test on 3.4? I recall that you mentioned you only reproduced a couple of @aphyr 's bugs. Have you found any new issues?

I think we are ready for the milestone 3. cc @serathius @hexfusion @spzala @ptabor

serathius commented 2 years ago

Looks great! Thanks for all the help. I think we can move forward with the release.

spzala commented 2 years ago

All items included in milestone 1 and milestone 2 are basically done.

There are two unresolved issues in milestone 1, and @lavacat is still progress of investigation. But both of them should only be test issues, and the pipeline can be green after about 1~2 retries when running into the issues. So I don't think they are blockers. If @lavacat can get them resolved soon, then I am OK to merge the PR(s). @lavacat could you update on the issues?

I moved #13895 out of milestone 2, because it may have big impact, and etcd isn't subject to the CVE. Please see my comment in #14191 (comment). So it should be safe. All items in milestone 2 are done.

@endocrimes have you finished the Jepsen test on 3.4? I recall that you mentioned you only reproduced a couple of @aphyr 's bugs. Have you found any new issues?

I think we are ready for the milestone 3. cc @serathius @hexfusion @spzala @ptabor +1 Great work here! Thank you!

serathius commented 2 years ago

Sorry for being late, however only now had time to look through production issues. It struct me that there were fixed in etcd, however never backported to v3.4. Maybe it would make sense to look at them again if they should be backported. List:

As this is late, feel free to skip them for this release. However I would recommend we consider to them for next one.

endocrimes commented 2 years ago

finished up my jepsen testing, I managed to replicate aphyr's findings, but didn't come across anything too different. Seems like no "new" issues, so 👍

ahrtr commented 2 years ago

Thanks all for the feedback.

As this is late, feel free to skip them for this release. However I would recommend we consider to them for next one.

Agreed. Let's consider to cherry pick them in 3.4.20 so as to minimize the impact, just as we discussed in https://github.com/etcd-io/etcd/issues/14105#issuecomment-1173064692 . The biggest change against the last release (3.4.18 released on Oct 15, 2021) is we bumped golang from 1.12 to 1.16, and also some system packages.

I will kick off releasing 3.4.19 once my last PR https://github.com/etcd-io/etcd/pull/14210 is approved & merged.

ahrtr commented 2 years ago

v3.4.19 is just released! Thanks everyone!

https://github.com/etcd-io/etcd/releases/tag/v3.4.19