bus1 / dbus-broker

Linux D-Bus Message Broker
https://github.com/bus1/dbus-broker/wiki
Apache License 2.0
667 stars 78 forks source link

test_uds_edge fails on big endian architectures with kernel 5.15 #280

Closed bluca closed 11 months ago

bluca commented 2 years ago

Since kernel 5.15 became available in Debian, test_uds_edge started failing on big endian manchines (ppc, ppc64, ia64, sparc64):

test-dispatch: ../src/util/test-dispatch.c:181: test_uds_edge: Assertion `c_assert_result && "r == sizeof(b)"' failed.
Aborted

recv() is returning -EINVAL here: https://github.com/bus1/dbus-broker/blob/main/src/util/test-dispatch.c#L181

Breakpoint 1, test_uds_edge (run=0) at ../src/util/test-dispatch.c:181
181                 c_assert(r == sizeof(b));
(gdb) p r
$2 = -1
(gdb) p errno
$1 = 22

https://buildd.debian.org/status/package.php?p=dbus-broker

I can reproduce this easily and have access to the affected hardware, but not sure what I am looking for.

dvdhrm commented 2 years ago

Catching up on things now. Is this still happening? The debian builds seem to be >3M old.

bluca commented 2 years ago

I haven't done a new upload, but I assume yes - will double check

bluca commented 2 years ago

@dvdhrm yup, still happens on an up-to-date debian unstable on ia64

bluca commented 2 years ago

Tests are not failing anymore with v30 - @dvdhrm did something change that could have affected that?

https://buildd.debian.org/status/package.php?p=dbus-broker

dvdhrm commented 2 years ago

CFLAGS changed, but not in a meaningful way (I hope...). I very much assume this is a kernel issue and fixed due to a kernel update. The failing test you saw is a test we carry in dbus-broker only to verify a particular kernel behavior we rely on. It has no particular connection to dbus-broker, but we just wanted to make sure we have it around so we see when things break upstream.

I will keep monitoring this, but if this turns out to not reappear, I am happy to close the issue ;) I am almost done with the backlog, so I will have time to deal with this soon.

bluca commented 2 years ago

Seen it again just now with v31 on sparc64:

https://buildd.debian.org/status/fetch.php?pkg=dbus-broker&arch=sparc64&ver=31-1&stamp=1652826419&raw=0

Kernel: Linux 5.15.0-2-sparc64-smp #1 SMP Debian 5.15.5-2 (2021-12-18) sparc64 (sparc64)
Toolchain package versions: binutils_2.38-4 dpkg-dev_1.21.7 g++-11_11.3.0-1 gcc-11_11.3.0-1 libc6-dev_2.33-7 libstdc++-11-dev_11.3.0-1 libstdc++6_12.1.0-2 linux-libc-dev_5.17.6-1+b1
dvdhrm commented 2 years ago

Ah, but this one is different! This time it fails dequeuing the message.

Edit: ah, no, I think I am wrong on this one.

dvdhrm commented 2 years ago

Btw., the initial problem was that recv() returned EINVAL, but only in the case where we drain the queue after a shutdown. I now found the upstream fix for this:

commit f9390b249c90a15a4d9e69fbfb7a53c860b1fcaf
Author: Vincent Whitchurch <vincent.whitchurch@axis.com>
Date:   Fri Nov 19 13:05:21 2021 +0100

    af_unix: fix regression in read after shutdown

I don't know how I missed that fix the last time, maybe it was queued on some branch that I did not consult. I am quite certain net-next did not have that queued, yet. Anyway, this clearly fixes the problem you described initially.

The fix should be part of 5.16:

$ git describe f9390b249
v5.16-rc1-231-gf9390b249c90

Also, I think I was wrong in my previous assumption. The new report is again the same. Not sure why I considered it different, didn't remember exactly what the initial assertion was.

dvdhrm commented 2 years ago

Your newest report shows 5.15. I assume it does not have the fix backported, yet.

bluca commented 2 years ago

Looks like that was backported to v5.15.9 so indeed it's not there yet. I have no control over the kernel of the build instances, so can't do much about it other than wait. But it's good news as it seems it will be solved soon.

commit 80d709875d920f7ca959040457b7393df706fe44
Author: Vincent Whitchurch <vincent.whitchurch@axis.com>
Date:   Fri Nov 19 13:05:21 2021 +0100

    af_unix: fix regression in read after shutdown

    [ Upstream commit f9390b249c90a15a4d9e69fbfb7a53c860b1fcaf ]
dvdhrm commented 2 years ago

Perfect! I will leave this open until the problem no longer appears.

dvdhrm commented 11 months ago

I am closing this as solved. The upstream kernel fix is now backported to the stable trees.

Thanks a lot for the report!