Closed fabianfett closed 10 months ago
@simonjbeaumont can you take a look at this?
Can you confirm whether your container has /dev/vsock
and if you are starting it with --privileged
?
❯ docker run --privileged --rm -v $PWD:/code -w /code -it swift:5.9.1-jammy ls /dev/vsock
/dev/vsock
❯ docker run --rm -v $PWD:/code -w /code -it swift:5.9.1-jammy ls /dev/vsock
ls: cannot access '/dev/vsock': No such file or directory
FWIW, I can run this test without a crash with swift:5.9.1-jammy
with --privileged
:
❯ docker run --privileged --rm -v $PWD:/code -w /code -it swift:5.9.1-jammy swift test --filter VsockAddressTest.testGetLocalCID
Building for debugging...
Build complete! (6.90s)
Test Suite 'Selected tests' started at 2023-11-06 13:18:07.422
Test Suite 'VsockAddressTest' started at 2023-11-06 13:18:07.424
Test Case 'VsockAddressTest.testGetLocalCID' started at 2023-11-06 13:18:07.424
Test Case 'VsockAddressTest.testGetLocalCID' passed (0.004 seconds)
Test Suite 'VsockAddressTest' passed at 2023-11-06 13:18:07.427
Executed 1 test, with 0 failures (0 unexpected) in 0.004 (0.004) seconds
Test Suite 'Selected tests' passed at 2023-11-06 13:18:07.427
Executed 1 test, with 0 failures (0 unexpected) in 0.004 (0.004) seconds
Without --privileged
it crashes:
❯ docker run --rm -v $PWD:/code -w /code -it swift:5.9.1-jammy swift test --filter VsockAddressTest.testGetLocalCID
Building for debugging...
Build complete! (6.90s)
Test Suite 'Selected tests' started at 2023-11-06 13:18:43.133
Test Suite 'VsockAddressTest' started at 2023-11-06 13:18:43.134
Test Case 'VsockAddressTest.testGetLocalCID' started at 2023-11-06 13:18:43.134
NIOPosix/VsockAddress.swift:223: Fatal error: 'try!' expression unexpectedly raised an error: open(file:oFlag:): No such file or directory (errno: 2)
...[crash dump]...
This behaviour isn't new to 5.9.1, it happens in 5.9.0-jammy
too.
I'll note however, that the crash I've pasted is different from yours. Mine is failing to open /dev/vsock
. But yours is failing after performing the ioctl()
.
What I'm surprised about in both cases is that these tests are running. They have try XCTSkipUnless(System.supportsVsock)
, which checks that we can create a socket with AF_VSOCK
. Clearly this is not enough of a guard.
On Linux, we can make that also check for the presence of /dev/vsock
which would skip the test in my case above.
This is a bug I can make a patch for, but it's not your bug...
However, we need to dig a little deeper on your environment to work out what's going on, because it seems you do have /dev/vsock
but IOCTL_VM_SOCKETS_GET_LOCAL_CID
isn't getting the local CID.
In @fabianfett's case, the following code is failing:
var cid = Self.any.rawValue
try Posix.ioctl(fd: fd, request: request, ptr: &cid) // IOCTL_VM_SOCKETS_GET_LOCAL_CID
precondition(cid != Self.any.rawValue)
Essentially this checks that, after a successful ioctl()
call, that the address is not VMADDR_CID_ANY
.
I've dug a little more on this and think that we've been too strict. Looking at some implementations...
VMADDR_CID_ANY
^0.VMADDR_CID_ANY
^1. VMADDR_CID_LOCAL
^2.VMADDR_CID_ANY
in the error path^3.All that to say... we probably shouldn't make any precondition on the result of IOCTL_VM_SOCKETS_GET_LOCAL_CID
since it appears there's some places where it may intentionally be VMADDR_CID_ANY
.
I am struggling to reproduce this locally in both a Linux VM and a Linux container, but I think the path forward here is a patch that does the following:
hasVsockSupport
guard that conditionally runs the test to include the presence of /dev/vsock
when on Linux.precondition
that checks that the local CID as returned by the ioctl()
is not VMADDR_CID_ANY
. @fabianfett has provided me some information about his VM.
My VM (where I cannot reproduce the crash) is using different Vsock transport from his. This is mine...
$ lsmod | grep vsock
vmw_vsock_virtio_transport 24576 0
vmw_vsock_virtio_transport_common 53248 1 vmw_vsock_virtio_transport
vsock 61440 2 vmw_vsock_virtio_transport_common,vmw_vsock_virtio_transport
This is @fabianfett's:
$ lsmod | grep vsock
vsock_loopback 16384 0
vmw_vsock_virtio_transport_common 40960 1 vsock_loopback
vmw_vsock_vmci_transport 32768 0
vsock 49152 3 vmw_vsock_virtio_transport_common,vsock_loopback,vmw_vsock_vmci_transport
vmw_vmci 90112 1 vmw_vsock_vmci_transport
There's a notable difference of transport, which, based on the above comment could be the source of the different behaviour.
@fabianfett would you mind trying out the patch here in your environment? https://github.com/apple/swift-nio/pull/2588
When running the SwiftNIO tests on a:
using:
I run into
NIOPosix/VsockAddress.swift:228: Precondition failed
. Full crash: