If a single process in a database is much slower than the other processes, the entire database will slow down. This means that failures which cause a process to slow down are much worse than failures that kill a process completely.
The goal of this project is to detect when we are limited by a process which is performing much worse than other processes in the cluster, and automatically exclude it.
When we are being held back by a slow process, it is hard to determine what speed the other processes could achieve. Therefore we should have a background process which periodically writes metadata to the database about achieved throughput.
This approach might produce false positives because of a changing workload profile, so we limit the process to only have the power to exclude at most one machine.
If a single process in a database is much slower than the other processes, the entire database will slow down. This means that failures which cause a process to slow down are much worse than failures that kill a process completely.
The goal of this project is to detect when we are limited by a process which is performing much worse than other processes in the cluster, and automatically exclude it.
When we are being held back by a slow process, it is hard to determine what speed the other processes could achieve. Therefore we should have a background process which periodically writes metadata to the database about achieved throughput.
This approach might produce false positives because of a changing workload profile, so we limit the process to only have the power to exclude at most one machine.