Closed rlenferink closed 7 months ago
I would like to drop support for PubSub bundles for Apache Celix 3.0.0 and if we do that, IMO this does not need to be solved.
If we would like to keep the PubSub bundles, I think the best solution is only set ZMQ_THREAD_PRIORITY
or ZMQ_THREAD_SCHED_POLICY
if this is explicitly enabled through a config property.
It is said by the documentation that the host machine's kernel should be configured properly(CONFIG_RT_GROUP_SCHED
): https://docs.docker.com/config/containers/resource_constraints/#configure-the-realtime-scheduler
And my local Ubuntu does not support this.
PubSub correctly provides configuration options for this. It seems to me a pure testing configuration issue: an additional CMake option like RUN_IN_CONTAINER
(and corresponding Conan option) should be enough to control these tests to use another set of *.properties.
The pubsub_zmq tests fail (SEGV) when running within a container. This is due to the user in the container possibly being the root user (
uid
= 0), which makes this check succeed:https://github.com/apache/celix/blob/e7aee1259a4c61463be8fcfa5dd4612a3a756192/bundles/pubsub/pubsub_admin_zmq/src/pubsub_zmq_topic_receiver.c#L643-L649
The
gotPermission
is later on used to determine whether the scheduling priority can be set:https://github.com/apache/celix/blob/e7aee1259a4c61463be8fcfa5dd4612a3a756192/bundles/pubsub/pubsub_admin_zmq/src/pubsub_zmq_topic_receiver.c#L655
When this is called with the user
root
within a container (uid
0), but the user outside the container being a rootless user, the tests segfault (unable to callpthread_setschedparam
).This is the line where libzmq in the end crashes:
https://github.com/zeromq/libzmq/blob/4097855ddaaa65ed7b5e8cb86d143842a594eebd/src/thread.cpp#L345
libzmq doesn't handle this too nicely and I am not sure whether this can be solved.
I tried with the suggest
libcap
and after that simply falling back to using thecapsh
command, but there thecap_sys_nice
can be set:Any suggestions to solve this?