GreptimeTeam / greptimedb

An Open-Source, Cloud-Native, Unified Time Series Database for Metrics, Events, and Logs with SQL/PromQL supported. Available on GreptimeCloud.
https://greptime.com/
Apache License 2.0
4k stars 289 forks source link

[Tracking Issue] Make region failover great again #4161

Open WenyXu opened 4 weeks ago

WenyXu commented 4 weeks ago

What type of enhancement is this?

Refactor, Tech debt reduction

What does the enhancement do?

Currently, the region failover procedure has follow issues:

  1. The region failover doesn't support migrating a region without data loss.
  2. The failover detector relies on the receiving heartbeats from the datanode; if the datanode doesn't send any heartbeats, we can't detect the failed region.
  3. We can't track the in-flight failover procedures, which means we can't trigger the failover procedure again if a failover procedure fails.

Implementation challenges

No response