baidu / braft

An industrial-grade C++ implementation of RAFT consensus algorithm based on brpc, widely used inside Baidu to build highly-available distributed systems.
Apache License 2.0
3.93k stars 879 forks source link

brpc report busy and caused server killed #66

Closed pdu closed 5 years ago

pdu commented 5 years ago

When I was loading massive snapshot through braft, I saw the server got killed. Only one node in the raft group.

W0924 05:53:46.894353 8047 external/com_github_pdu_brpc/src/bvar/detail/sampler.cpp:139] bvar is busy at sampling for 2 seconds!

W0924 05:53:49.880662 8047 external/com_github_pdu_brpc/src/bvar/detail/sampler.cpp:139] bvar is busy at sampling for 2 seconds!

W0924 05:53:59.887006 8047 external/com_github_pdu_brpc/src/bvar/detail/sampler.cpp:139] bvar is busy at sampling for 2 seconds!

W0924 05:54:12.745045 8047 external/com_github_pdu_brpc/src/bvar/detail/sampler.cpp:139] bvar is busy at sampling for 2 seconds!

W0924 05:54:30.896412 8047 external/com_github_pdu_brpc/src/bvar/detail/sampler.cpp:139] bvar is busy at sampling for 2 seconds!

W0924 05:54:45.414910 8047 external/com_github_pdu_brpc/src/bvar/detail/sampler.cpp:139] bvar is busy at sampling for 2 seconds!

W0924 05:54:54.228179 8047 external/com_github_pdu_brpc/src/bvar/detail/sampler.cpp:139] bvar is busy at sampling for 2 seconds!

W0924 05:55:20.424473 8047 external/com_github_pdu_brpc/src/bvar/detail/sampler.cpp:139] bvar is busy at sampling for 2 seconds!

W0924 05:55:30.647188 8047 external/com_github_pdu_brpc/src/bvar/detail/sampler.cpp:139] bvar is busy at sampling for 2 seconds!

W0924 05:56:18.754196 8056 external/com_github_pdu_brpc/src/brpc/server.cpp:325] UpdateDerivedVars is too busy!

W0924 05:56:19.068273 8049 external/com_github_pdu_brpc/src/brpc/global.cpp:207] GlobalUpdate is too busy!

Killed

pdu commented 5 years ago

I found it may be the memory OOM issue, let me check it first. Thanks!

PFZheng commented 5 years ago

This maybe happened that some bvar cost too many times for sampling. Are you use some self-defined bvar?

pdu commented 5 years ago

@PFZheng no, I haven't used that. I run the program in docker and restricted memory usage, which caused the above problem.

Edward-xk commented 5 years ago

@PFZheng no, I haven't used that. I run the program in docker and restricted memory usage, which caused the above problem.

so you have solved this problem yourself right? may i close this issue now?