firstbatchxyz / dkn-compute-node

Compute Node of Dria Knowledge Network.
Apache License 2.0
93 stars 34 forks source link

bug: resource usage #140

Open erhant opened 3 weeks ago

erhant commented 3 weeks ago

Problem

The process may sometimes reach ulimit and therefore give errors during API calls such as Os { code: 24, kind: Uncategorized, message: \"No file descriptors available\" })) } during a DNS lookup.

How to Reproduce

Not yet known.

Expected Behaviour

Should not give error.

Version

erhant commented 3 weeks ago

https://github.com/hyperium/hyper/issues/1422 related?

erhant commented 3 weeks ago

https://discuss.libp2p.io/t/rust-reading-from-socket-slower-with-every-event/460/2 also related maybe

erhant commented 2 weeks ago

by using lsof -c <process-name-here> on the compute node, we have seen that as the peer count grows file descriptors are used more and more, eventually hitting the default limit of 1024.

on a separate note, the peer count is almost 1-1 correlated with the established outgoing connections count, so as a quick solution we have added a limiter on this count, effectively setting a cap on the number of all peers of a node.

erhant commented 1 week ago

setting the maximum established outgoing connections allows to set a cap on this resource usage