apache / celix

Apache Celix is a framework for C and C++14 to develop dynamic modular software applications using component and in-process service-oriented programming.
https://celix.apache.org/
Apache License 2.0
160 stars 85 forks source link

pubsub_zmq aborts when running within a container #658

Closed rlenferink closed 7 months ago

rlenferink commented 10 months ago

The pubsub_zmq tests fail (SEGV) when running within a container. This is due to the user in the container possibly being the root user (uid = 0), which makes this check succeed:

https://github.com/apache/celix/blob/e7aee1259a4c61463be8fcfa5dd4612a3a756192/bundles/pubsub/pubsub_admin_zmq/src/pubsub_zmq_topic_receiver.c#L643-L649

The gotPermission is later on used to determine whether the scheduling priority can be set:

https://github.com/apache/celix/blob/e7aee1259a4c61463be8fcfa5dd4612a3a756192/bundles/pubsub/pubsub_admin_zmq/src/pubsub_zmq_topic_receiver.c#L655

When this is called with the user root within a container (uid 0), but the user outside the container being a rootless user, the tests segfault (unable to call pthread_setschedparam).

This is the line where libzmq in the end crashes:

https://github.com/zeromq/libzmq/blob/4097855ddaaa65ed7b5e8cb86d143842a594eebd/src/thread.cpp#L345

libzmq doesn't handle this too nicely and I am not sure whether this can be solved.

I tried with the suggest libcap and after that simply falling back to using the capsh command, but there the cap_sys_nice can be set:

root@fedora:/home/rlenferink/workspace/asf/celix/celix-container# capsh --print
Current: =ep
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore

Any suggestions to solve this?

pnoltes commented 10 months ago

I would like to drop support for PubSub bundles for Apache Celix 3.0.0 and if we do that, IMO this does not need to be solved.

If we would like to keep the PubSub bundles, I think the best solution is only set ZMQ_THREAD_PRIORITY or ZMQ_THREAD_SCHED_POLICY if this is explicitly enabled through a config property.

PengZheng commented 10 months ago

It is said by the documentation that the host machine's kernel should be configured properly(CONFIG_RT_GROUP_SCHED): https://docs.docker.com/config/containers/resource_constraints/#configure-the-realtime-scheduler And my local Ubuntu does not support this.

PubSub correctly provides configuration options for this. It seems to me a pure testing configuration issue: an additional CMake option like RUN_IN_CONTAINER(and corresponding Conan option) should be enough to control these tests to use another set of *.properties.