childe / gohangout

使用 golang 模仿的 Logstash。用于消费 Kafka 数据,处理后写入 ES、Clickhouse 等。
MIT License
1.01k stars 234 forks source link

es8 下没有对 es 返回做429 状态码做正确处理 #233

Closed zcola closed 9 months ago

zcola commented 10 months ago

I1026 16:40:12.727927       1 bulk_http.go:170] bulk done with execution_id 118 0.638 36000 56426.331
I1026 16:40:13.041604       1 elasticsearch_output.go:150] could NOT get errors in response:{"error":{"root_cause":[{"type":"es_rejected_execution_exception","reason":"rejected execution of coordinating operation [coordinating_and_primary_bytes=1700696668, replica_bytes=0, all_bytes=1700696668, coordinating_operation_bytes=26208468, max_coordinating_and_primary_bytes=1717986918]"}],"type":"es_rejected_execution_exception","reason":"rejected execution of coordinating operation [coordinating_and_primary_bytes=1700696668, replica_bytes=0, all_bytes=1700696668, coordinating_operation_bytes=26208468, max_coordinating_and_primary_bytes=1717986918]"},"status":429}
I1026 16:40:13.041627       1 bulk_http.go:170] bulk done with execution_id 100 0.839 36000 42908.225
I1026 16:40:13.071678       1 bulk_http.go:153] bulk 36000 docs with execution_id 119

版本 1.10

image

触发条件 压测节点cpu 打满了也只能写100k ,按理开3个4 core 实例就能满足,如果开了 20-30个后,消费速率上去到 400k ,实际es 监控还是只写入了100k ,检查hangout 异常日志只有 429 状态码 后发现这个问题

childe commented 10 months ago

https://github.com/childe/gohangout#retry_response_code

把 429 也加到 retry_response_code 里面,应该会重试。

zcola commented 9 months ago

https://github.com/childe/gohangout#retry_response_code

把 429 也加到 retry_response_code 里面,应该会重试。

可以是可以,我以为上面那个代码会429重试,429 是es 返回告知应该反压了