apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.41k stars 1.31k forks source link

Multiple DCs: Add new status information for monitoring multiple DC deployments #315

Open etschannen opened 6 years ago

etschannen commented 6 years ago

The most basic information to monitor is the version lag between the primary DC and the remote DC.

Data distribution monitoring may also want to be separated by DC, although this may be tricky to implement.

Some basic machine fault tolerance calculations should be updated, and a new DC fault tolerance should be added.

Network latencies between DCs would be nice to have.

Since satellites may not be in active use in some configurations, their failure monitoring may need to be done differently.

etschannen commented 6 years ago

The only thing we need for 6.0 is the version lag.

etschannen commented 6 years ago

Status can use a TLogQueuingMetricsRequest to ask a primary TLog (isLocal && hasBestLocation) and a remote TLog (!isLocal && hasBestLocation) for their versions, and then report the difference in version. Divide the result by VERSIONS_PER_SECOND to get an approximate time lag.

etschannen commented 6 years ago

The version lag has been added in https://github.com/apple/foundationdb/pull/492, so I am moving the milestone to 6.1