apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.77k stars 3.29k forks source link

[fix](move-memtable) immediately return error when close wait failed #44344

Closed kaijchen closed 22 hours ago

kaijchen commented 1 day ago

What problem does this PR solve?

Related PR: #38003

Problem Summary:

38003 introduced a problem where the last sink node could report success even when close wait timeout, which may cause data loss.

Previously we made that change hoping to tolerate minority replica failure in this step. However, it turns out the last sink node could miss tablet reports from downstreams in case of close wait failure.

This PR fixes the problem by return the close_wait error immediately. The most common error in close wait is timeout, and it should not be fault tolerant on a replica basis anyways.

Release note

None

Check List (For Author)

Check List (For Reviewer who merge this PR)

doris-robot commented 1 day ago

Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?
kaijchen commented 1 day ago

run buildall

github-actions[bot] commented 1 day ago

clang-tidy review says "All clean, LGTM! :+1:"

doris-robot commented 1 day ago

TeamCity be ut coverage result: Function Coverage: 38.02% (9900/26039) Line Coverage: 29.21% (82824/283546) Region Coverage: 28.34% (42529/150085) Branch Coverage: 24.90% (21558/86590) Coverage Report: http://coverage.selectdb-in.cc/coverage/c040ae01a6cbdc13de30d953d2f666b82aa887b1_c040ae01a6cbdc13de30d953d2f666b82aa887b1/report/index.html

liaoxin01 commented 1 day ago

run buildall

doris-robot commented 1 day ago

TeamCity be ut coverage result: Function Coverage: 38.02% (9899/26039) Line Coverage: 29.20% (82790/283546) Region Coverage: 28.34% (42528/150085) Branch Coverage: 24.90% (21559/86590) Coverage Report: http://coverage.selectdb-in.cc/coverage/c040ae01a6cbdc13de30d953d2f666b82aa887b1_c040ae01a6cbdc13de30d953d2f666b82aa887b1/report/index.html

github-actions[bot] commented 1 day ago

PR approved by at least one committer and no changes requested.

github-actions[bot] commented 1 day ago

PR approved by anyone and no changes requested.