apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.49k stars 1.31k forks source link

Automatically exclude slow machines #1017

Open etschannen opened 5 years ago

etschannen commented 5 years ago

If a single process in a database is much slower than the other processes, the entire database will slow down. This means that failures which cause a process to slow down are much worse than failures that kill a process completely.

The goal of this project is to detect when we are limited by a process which is performing much worse than other processes in the cluster, and automatically exclude it.

When we are being held back by a slow process, it is hard to determine what speed the other processes could achieve. Therefore we should have a background process which periodically writes metadata to the database about achieved throughput.

This approach might produce false positives because of a changing workload profile, so we limit the process to only have the power to exclude at most one machine.

alexmiller-apple commented 5 years ago

IASO: A Fail-Slow Detection and Mitigation Framework for Distributed Storage Services is relevant to this topic.