SLA & latency tail & coordinated omission - Githubissues

baiwfg2 / awesome-readings

记录看各种文章、论文的心得

2 stars 0 forks source link

SLA & latency tail & coordinated omission #32

Open baiwfg2 opened 3 years ago

baiwfg2 commented 3 years ago

数据库系统经常关注尾延时，P99 延时（数据库系统一定要有 P99 等指标，仅关注平均延时是不对的），这影响着用户体验和系统SLA，我认为非常有必要了解延时由哪些因素引起、如何消除

之所以把这三者放在一起，是觉得有所关联。SLA 的描述中经常要谈到百分位延时，而延时的计算涉及到coordinated omission 问题

[1] the tail at scale, Jeffrey Dean, 2013

[2] http://accelazh.github.io/storage/Tail-Latency-Study 作者收集一很多关于 tail latency 的问题

[3] https://github.com/giltene/wrk2 , Gil Tene 改造的wrk，处理了 coordinated omission

[4] https://news.ycombinator.com/item?id=10486215

[5] Coordinated Omission in NOSQL database benchmarking , paper

[6] https://azure.microsoft.com/en-us/blog/azure-documentdb-service-level-agreements/ , 微软 cosmosdb 的定义，非常有参考意义

baiwfg2 commented 3 years ago

[6]

他们定义了一种comprehensive SLA,包括吞吐、延时、可用性、一致性。

可用性SLA 定义是很好的，根据一小时的错误请求数，就可算出一个月的可用性

吞吐SLA听起来有点奇怪，与限流，RU有关；一致性SLA 也不好理解，什么叫 successful request not delivering consistency level ?

延时 SLA 以P99=99% 为标准，低于这个时，会返还25%的费用

我的其它疑问：https://www.yuque.com/baiwfg2/database/bm0al7