[Hono] dispatch-router Pod failing to start HTTP server

eclipse / packages

IoT Packages project

https://eclipse.org/packages

Eclipse Public License 2.0

46 stars 66 forks source link

[Hono] dispatch-router Pod failing to start HTTP server #312

Open jlengelsen opened 3 years ago

jlengelsen commented 3 years ago

The dispatch-router Pod fails to start the HTTP server when installing either the cloud2edge or the hono Helm package into a Kubernetes cluster provisioned with kind on my machine (Fedora 34). It seems like the Pod is trying to allocate 1073741816 file descriptors which is exactly the ulimit of the host OS (default on Fedora).

These are the pod logs related to the issue:

HTTP (error) OOM allocating 1073741816 fds
HTTP (error) ZERO RANDOM FD
SERVER (critical) No memory starting HTTP server
...
SERVER (error) No HTTP support to listen on 0.0.0.0:8088
...

Any ideas how i can fix that?

calohmn commented 3 years ago

This seems related to DISPATCH-1897 and https://github.com/warmcat/libwebsockets/issues/2449. There, the recommendation is to adapt the ulimit value. In libwebsockets v4.2.0 and newer there seems to be a fix, but even the newest dispatch router image (quay.io/interconnectedcloud/qdrouterd:1.17.0) is still using an older version (3.2.1-1.fc30).

jlengelsen commented 3 years ago

This seems related to DISPATCH-1897 and warmcat/libwebsockets#2449. There, the recommendation is to adapt the ulimit value. In libwebsockets v4.2.0 and newer there seems to be a fix, but even the newest dispatch router image (quay.io/interconnectedcloud/qdrouterd:1.17.0) is still using an older version (3.2.1-1.fc30).

Confirmed. I have built the qdrouterd image locally with libwebsockets v4.2.2 and the error was gone. Thanks for the hint.

sophokles73 commented 3 years ago

@jlengelsen can this issue be closed?

jlengelsen commented 3 years ago

@jlengelsen can this issue be closed?

Well, the issue is not solved yet. Installing either the cloud2edge or the hono Helm package into a cluster where no sane ulimits for containers are set still fails. In order to solve the issue the qdrouterd image that is used in the packages has to upgrade to libwebsockets v4.2.0 or newer which isn't the case yet.

sophokles73 commented 3 years ago

It seems that it is not in the hands of the Hono not the IoT Packages project to resolve the issue. There even seem to be different opinions as to whether this is actually a bug/problem or works as designed. In any case, until the Qpid project decides to use libwebsocket >= 4.2.0, it looks like setting ulimits as advised in https://github.com/warmcat/libwebsockets/issues/1769 is a reasonable workaround, or doesn't it?

jlengelsen commented 3 years ago

Agreed, it is the project maintainers' decision whether this issue should be tracked here. Setting ulimits works but since there is no easy way implemented yet to set ulimits for containers running in a k8s cluster (https://github.com/kubernetes/kubernetes/issues/3595) it is annoying to do...