Open bobrik opened 10 years ago
Out of curiosity, have you made sure you are not suffering from port exhaustion or conntrack overflow? Check dmesg and netstat. This is usually the problem when I see Ceph go asymptotic.
dmesg does not reveal anything, netstat shows many connections (>50k) in TIME_WAIT
, because osds open and close connections very often in problematic case.
So, it's definitely port exhaustion, then. Obviously, that's an effect, not the root cause, but it does, perhaps, allow you to filter out some symptoms (load due to I/O wait, for instance).
This happens with empty osds too, they cannot cause any io wait.
Correct; I am merely pointing out that it is possible to, for the purposes of testing a hypothesis, separate out symptoms which are related to port exhaustion, such as load due to I/O wait.
Can you publish 0.80.7 images? Maybe the bug was fixed already, even though I wasn't able to find it it changelog.
Done
No luck, still the same issue.
It seems that you cannot run many osds per node with
ulexus/ceph-osd
. I posted description to ceph-users mailing list to figure out what is wrong:http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-November/044996.html