douban / Kenshin

Kenshin: A time-series database alternative to Graphite Whisper with 40x improvement in IOPS
Apache License 2.0
206 stars 24 forks source link

what's the problem with "failed to write() to 127.0.0.1:2013: uncomplete write" #14

Closed zzl0 closed 7 years ago

zzl0 commented 7 years ago

https://github.com/douban/graphite-kenshin/issues/7

I test kenshin with much load,the cpu usage almost get to 100 %,and i got error with "failed to write() to 127.0.0.1:2013: uncomplete write",is that mean some data was lost?
zzl0 commented 7 years ago

@luckywarrior https://github.com/grobian/carbon-c-relay/issues/17#issuecomment-55879560

camper42 commented 7 years ago

@luckywarrior

In our production environment, every server runs 16 instance of rurouni cache.

luckywarrior commented 7 years ago

I test with 6 instances per server with 4 core cpu and 4 G memory,when the load came up to almost 100%, i got following assessment result:

  1. 200k metrics received / 10 secs / carbon-c-relay
  2. 0 relay dropped
  3. 250 iops

and one of the metric graph disappear every time i refresh ,what's that problem?

camper42 commented 7 years ago

same problem while from=-1min sometimes and problem disappear while from=-15min or longer period

I have no idea about why this happen, any idea ? @zzl0

zzl0 commented 7 years ago

@camper42 maybe related to cache time. if you can reproduce it, please add some log and debug it.

zzl0 commented 7 years ago

@luckywarrior @camper42 we should create another issue to track the graph disappear problem.

@luckywarrior can we close this issue ?

luckywarrior commented 7 years ago

It seems that this problem related to max open files limits,change /etc/security/limits.conf and /etc/sysctl.conf and /etc/pam.d/login will solve the problem.