Closed ahrtr closed 2 years ago
Nice to backport: https://github.com/etcd-io/etcd/pull/12335
Nice to backport: #12335
Thanks for the feedback. I just had a quick review on the PR, and it's a minor change and should be safe to backport. Please feel free to deliver a PR for this.
@ahrtr #14169
@ahrtr #14169
Thanks for the feedback. I agree the PR can improve the performance/throughput, but it also added a new flag --max-concurrent-streams
which usually we shouldn't backport. We need to get consensus on this one. Please kindly provide your opinion (thumb up or down?). thx.
Please read https://github.com/etcd-io/etcd/pull/14169#discussion_r917154243 if you want to quickly understand the context of the PR.
Personally, I incline to backporting it to 3.4.
thanks for the feedback. it is actually a bug, pr #8238 only solves a part of the problem (etcd does not enable tls), we have encountered a lot of user feedback (users using APISIX, accessing etcd cluster users through HTTP 1.1x protocol). @ahrtr
I added 14169 into the list, please let me know if anyone have any concerns. @serathius @spzala @ptabor
I pinged the original author of PR 12846, but got no response so far. Is anyone interested in backporting it to 3.4? thx
@ahrtr https://github.com/etcd-io/etcd/pull/14246 - have backported the requested PR. I maybe lacking understanding, but from the looks of it, this function isn't called anywhere (even in the original PR), am I doing this correctly? Thanks. @ptabor @spzala @serathius
@ahrtr #14246 - have backported the requested PR. I maybe lacking understanding, but from the looks of it, this function isn't called anywhere (even in the original PR), am I doing this correctly? Thanks. @ptabor @spzala @serathius
Thanks @vivekpatani for the PR. Are you talking about RemoveMatchFile
? It's supposed to be only called on etcd startup.
@ahrtr I think I had some misunderstanding, the PR is now fixed/mergeable, please feel free to review when you have a minute, thank you!
https://github.com/etcd-io/etcd/pull/13435/? @ahrtr This PR is safe to backport in my opinion.
13435? @ahrtr This PR is safe to backport in my opinion.
Agreed. Please feel free to deliver a PR, thx.
Can we also consider backporting PR https://github.com/etcd-io/etcd/pull/12871? It seems be be quite safe to backport change and it will both reduce memory consumption and reduce work in latency sensitive applierV3backend.Apply.
Can we also consider backporting PR #12871? It seems be be quite safe to backport change and it will both reduce memory consumption and reduce work in latency sensitive applierV3backend.Apply.
It should have already been backported in 12888
@ahrtr https://github.com/etcd-io/etcd/pull/12762 is fix for https://github.com/etcd-io/etcd/issues/12680, in the end the solution was done in one, but three PRs:
As all of them change how pending requests are handled during leader election I would prefer to merge all of them to avoid this logic to diverge between branches. Partial backport means that we could end up not really fixing it.
~Hi @ahrtr ,~
~I have the backport for #13435 ready. However, I currently do not have permission to push a branch to etcd-io/etcd.git
.~
@ramses you'll need to make a "fork" on GitHub, then push to that, and open the PR across repos. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request
Thanks @endocrimes , realized that immediately after :-)
The PR for #13435: https://github.com/etcd-io/etcd/pull/14254
@ahrtr #12762 is fix for #12680, in the end the solution was done in one, but three PRs:
* [raft: postpone MsgReadIndex until first commit in the term #12762](https://github.com/etcd-io/etcd/pull/12762) * [etcdserver: resend ReadIndex request on empty apply request #12795](https://github.com/etcd-io/etcd/pull/12795) * [Read index retry #12780](https://github.com/etcd-io/etcd/pull/12780)
As all of them change how pending requests are handled during leader election I would prefer to merge all of them to avoid this logic to diverge between branches. Partial backport means that we could end up not really fixing it.
Thanks @serathius for the reminder. I will take care of all of them myself later.
I saw the following errors multiple times in 3.4 pipeline, but never get time to have a deep dive,
Any help is welcome! Thanks!
Although there are still some issues which need further investigation, we have already finished backporting all existing PRs.
@endocrimes Do you have bandwidth to run Jepsen test on release-3.4?
@ahrtr sure š
I may release 3.4.20 sometime next week when all remaining tasks(see below) are done,
@serathius @spzala @ptabor @hexfusion Please kindly let me know if you have any comments or concerns.
@ahrtr thank you! No concerns, and sounds great.
@ahrtr sure š
@endocrimes any update on the Jepsen test on release-3.4?
https://github.com/etcd-io/etcd/issues/14256 isn't a blocker any more, please see my comment https://github.com/etcd-io/etcd/issues/14256#issuecomment-1202083883
@ahrtr running them now :)
Hi @ahrtr @lavacat @serathius found one last one that may need to be back ported? https://github.com/etcd-io/etcd/pull/13824 Please let me know if that is the case. Thanks.
Hi @ahrtr @lavacat @serathius found one last one that may need to be back ported? #13824 Please let me know if that is the case. Thanks.
Thanks @vivekpatani .
etcd 3.5.4 (and main) is using DialContext, while etcd 3.4.19 (and release-3.4 latest code) is using Dial. So release-3.4 doesn't have the issue https://github.com/etcd-io/etcd/issues/13810.
I agree DialContext
is better than Dial
because Dial
has already been deprecated. But Let's just keep it as it's, because we should only cherry pick bug fixes into 3.4.
Talked to @endocrimes , no any new issue found during the Jepsen test. Will release 3.4.20 soon.
3.4.20 is just released, https://github.com/etcd-io/etcd/releases/tag/v3.4.20 .
Thanks everyone in the CC list. It's really the result of great team work!
Thanks again !!!
Thanks for doing this.
etcd 3.5.4 (and main) is using DialContext, while etcd 3.4.19...
etcd-3.5 latest release is not using DialContext
or am I missing something?
@ahrtr
@vivekpatani The reason is the PR https://github.com/etcd-io/etcd/pull/13824 has been cherry picked to release-3.5 yet. Please feel free to deliver a PR for that. thx
In release-3.5, DialContext is used in the transport, but Dial is used in the ValidateSecureEndpoints
.
In release-3.4, both places are using Dial
.
We intentionally skipped back-porting some bug fixes in 3.4.19 in order to minimize the impact. Since 3.4.19 is already released and validated to be working well, so we can consider to release 3.4.20 to backport some PRs.
Pipelines/test
There are still some flaky test failures from time to time. As previous release, any fix to stabilize the pipeline is welcome.
Issues/PRs that should be included in 3.4.20
Issues/PRs not considered in 3.4.20
Please feel free to chime in if you think any PR/issues need to be investigated or included in 3.4.20. Please also feel free to let us know if anyone has any concerns or comment. cc @ptabor @serathius @spzala @hexfusion @mitake @lavacat @endocrimes @dims @liggitt @neolit123