eclipse / mosquitto

Eclipse Mosquitto - An open source MQTT broker
https://mosquitto.org
Other
8.91k stars 2.37k forks source link

Failure with kernels not supporting CLOCK_BOOTTIME #3089

Open diego-santacruz opened 1 month ago

diego-santacruz commented 1 month ago

I need to run mosquitto (client using libmosquitto) on a platform which has a Linux kernel without CLOCK_BOOTTIME but where the libc headers declare CLOCK_BOOTTIME, since they are decoupled from the kernel itself.

Under these circumstances libmosquitto goes crazy and fails with Keepalive exceeded errors. I traced the issue to mosquitto_time() that will do clock_gettime(CLOCK_BOOTTIME, &tp) if CLOCK_BOOTTIME is defined by the libc headers, without checking the return value. Since on this platform there is no CLOCK_BOOTTIME the call returns -EINVAL and the value returned by mosquitto_time() is basically garbage, so any timestamp comparisons go haywire.

The fix is relatively simple, and basically boils down to probing, at runtime, for CLOCK_BOOTTIME and using CLOCK_MONOTONIC as fallback.

Mosquitto version: 2.0.18

karlp commented 1 month ago

have you considered fixing your libc headers to be sane? :) you're just going to fall into other wholes with other software :)

diego-santacruz commented 1 month ago

The libc headers are, by design, actually decoupled from the kernel that will run on the target system.

In our case, the context is a Yocto based build which targets multiple different devices with a single user space (hence common libc headers). The different devices have common architectures but must use different kernels due to hardware support. On one of those the kernel is an old one does not have support for CLOCK_BOOTTIME (and the hardware vendor never pushed the support for their SoC to mainline kernel and never updated the kernel either, so we are stuck with an old kernel). And yes, there are other holes here and there due to using an old kernel but we chase them. Of all the software we use mosquitto is the only one which uses CLOCK_BOOTTIME without checking that the syscall succeeded.

Although we can argue that kernels without CLOCK_BOOTTIME are not supported, calling a syscall without checking for success is always a problem.

The PR I have linked solves that in what I think is a fairly simple and maintainable way, although I noticed there is a typo in the type of a variable, I'll update the PR shortly.

diego-santacruz commented 1 month ago

I have amended the linked PR #3090 with the type fix I noticed.