axboe / liburing

Library providing helpers for the Linux kernel io_uring support
MIT License
2.7k stars 393 forks source link

`test/socket-io-cmd.t` randomly fails: values does not match: 1360 != 0 #1136

Closed ammarfaizi2 closed 2 months ago

ammarfaizi2 commented 2 months ago

test/socket-io-cmd.t sometimes it runs well; sometimes, it fails.

Kernel: 6.8.2-afs-2024-04-02 Arch: x86_64

Reproduce

cd test;
./runtests-loop.sh socket-io-cmd.t;

After several iterations, it randomly fails; the value of the output is also random.

Failure outputs:

Running test socket-io-cmd.t                                        values does not match: 588 != 0
Test socket-io-cmd.t failed with ret 1
Tests failed (1): <socket-io-cmd.t>
Tests failed at loop 70
Running test socket-io-cmd.t                                        values does not match: 308 != 0
Test socket-io-cmd.t failed with ret 1
Tests failed (1): <socket-io-cmd.t>
Tests failed at loop 62
unning test socket-io-cmd.t                                        values does not match: 1334 != 0
Test socket-io-cmd.t failed with ret 1
Tests failed (1): <socket-io-cmd.t>
Tests failed at loop 22
Running test socket-io-cmd.t                                        values does not match: 52 != 0
Test socket-io-cmd.t failed with ret 1
Tests failed (1): <socket-io-cmd.t>
Tests failed at loop 14
Running test socket-io-cmd.t                                        values does not match: 1360 != 0
Test socket-io-cmd.t failed with ret 1
Tests failed (1): <socket-io-cmd.t>
Tests failed at loop 15
axboe commented 2 months ago

I traced this down the stack, and it looks correct in the sense that raw_ioctl() is indeed finding an skb and hence returns the length. Not sure who's sending that or where it's coming from, but it's most certainly there. I think for now, since we're close to release, I'll just add a single retry loop in case they differ before failing. They should find the same at that point, at least.

Though it would be nice to know where this data is coming from...

axboe commented 2 months ago

Can you retest with the current tree?

ammarfaizi2 commented 2 months ago

Not sure who's sending that or where it's coming from, but it's most certainly there. I think for now, since we're close to release, I'll just add a single retry loop in case they differ before failing. They should find the same at that point, at least.

Though it would be nice to know where this data is coming from...

Oh, what a riveting mystery we've stumbled upon!

Unfortunately, I am not familiar with SOCK_RAW. Adding Breno, just in case he knows something about this, as he's the one who added the feature. Also, Netdev people can probably quickly recognize what's going on.

Cc: @leitao

Breno, do you have any idea where the data is coming from?

ammarfaizi2 commented 2 months ago

Can you retest with the current tree?

Looks good now, tested 1k+ iterations with no fail.

ammarfaizi2 commented 2 months ago

Just a quick test: isolating the program in a network namespace where it only has a loopback interface doesn't cause the test to fail.

ip netns add iou;
ip netns exec iou bash -i;
ip link set up dev lo;
# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
# Commit before the 'retry workaround'.
git checkout 1cbd4cffa88177;

cd test;
make -j;
./runtests-loop.sh socket-io-cmd.t;
axboe commented 2 months ago

Some dumps of what I saw just now seeing what's in the socket. I tend to see 76 or 40 bytes pretty reliably:

Running test socket-io-cmd.t                                        recv=76
45 0 0 4c 98 3 0 0 40 6 ca 98 a 0 2 2 a 0 2 f c1 f0 0 16 92 7e 5f 22 59 7d 64 fa 50 18 ff ff 6c 7f 0 0 8 68 74 c6 48 9 26 3f 5c 6c 96 f 11 16 c5 5e 1b fc 3a 8d 6d be d e9 37 f4 83 87 ed 8a d3 6b 50 e2 65 c 
values does not match: 76 != 0
Running test socket-io-cmd.t                                        recv=40
45 0 0 28 9e 41 0 0 40 6 c4 7e a 0 2 2 a 0 2 f c7 f6 0 16 93 75 55 5e ed 40 6a 8c 50 10 ff ff 8f 16 0 0