Closed rwmjones closed 6 years ago
errno might be a red herring. Here's the strace output:
openat(AT_FDCWD, "/var/tmp/fedora-27.img", O_RDWR) = 6
ioctl(6, BLKGETSIZE64, 0x7ffc239931f8) = -1 ENOTTY (Inappropriate ioctl for device)
fstat(6, {st_mode=S_IFREG|0644, st_size=6442450944, ...}) = 0
The exit happens later on.
It seems to fail somewhere during option negotiation, but instead of printing why, it exits and prints the random value of errno from this much earlier (non-)error.
This strace seems to make no sense, but here is the tail end where the failure happens:
write(2, "** \33[1;32mMessage\33[0m: \33[34m16:5"..., 64** Message: 16:57:43.734: Starting to serve
) = 64
read(5, "%`\225\23\0\0\0\0\34\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0", 28) = 28
mmap(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7f84ddb88000
mprotect(0x7f84ddb89000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(strace: Process 6668 attached
child_stack=0x7f84de387fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f84de3889d0, tls=0x7f84de388700, child_tidptr=0x7f84de3889d0) = 6668
[pid 6668] set_robust_list(0x7f84de3889e0, 24 <unfinished ...>
[pid 6642] read(5, <unfinished ...>
[pid 6668] <... set_robust_list resumed> ) = 0
[pid 6668] mmap(NULL, 134217728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x7f84d5b88000
[pid 6668] munmap(0x7f84d5b88000, 38240256) = 0
[pid 6668] munmap(0x7f84dc000000, 28868608) = 0
[pid 6668] mprotect(0x7f84d8000000, 135168, PROT_READ|PROT_WRITE) = 0
[pid 6668] prctl(PR_SET_NAME, "pool") = 0
[pid 6668] pread64(6, "\353c\220\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0) = 4096
[pid 6668] write(5, "gDf\230\0\0\0\0\34\0\0\0\1\0\0\0", 16) = 16
[pid 6668] write(5, "\353c\220\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid 6668] futex(0x1c9d5b0, FUTEX_WAIT_PRIVATE, 0, {tv_sec=0, tv_nsec=499998761} <unfinished ...>
[pid 6642] <... read resumed> "%`\225\23\0\0\0\0\34\0\0\0\2\0\0\0\0\0\0\0\0\0\20\0\0\0\20\0", 28) = 28
[pid 6642] futex(0x1c9d5b0, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 6668] <... futex resumed> ) = 0
[pid 6642] read(5, <unfinished ...>
[pid 6668] pread64(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 4096) = 4096
[pid 6668] write(5, "gDf\230\0\0\0\0\34\0\0\0\2\0\0\0", 16) = 16
[pid 6668] write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid 6668] futex(0x1c9d5b0, FUTEX_WAIT_PRIVATE, 1, {tv_sec=0, tv_nsec=499998805} <unfinished ...>
[pid 6642] <... read resumed> "%`\225\23\0\0\0\0\34\0\0\0\3\0\0\0\0\0\0\1\177\377\0\0\0\0\20\0", 28) = 28
[pid 6642] futex(0x1c9d5b0, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 6668] <... futex resumed> ) = 0
[pid 6642] read(5, <unfinished ...>
[pid 6668] pread64(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 6442385408) = 4096
[pid 6668] write(5, "gDf\230\0\0\0\0\34\0\0\0\3\0\0\0", 16) = 16
[pid 6668] write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid 6668] futex(0x1c9d5b0, FUTEX_WAIT_PRIVATE, 2, {tv_sec=0, tv_nsec=499998907} <unfinished ...>
[pid 6642] <... read resumed> "%`\225\23\0\0\0\0\34\0\0\0\4\0\0\0\0\0\0\1\177\377\340\0\0\0\20\0", 28) = 28
[pid 6642] futex(0x1c9d5b0, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 6668] <... futex resumed> ) = 0
[pid 6642] read(5, <unfinished ...>
[pid 6668] pread64(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 6442442752) = 4096
[pid 6668] write(5, "gDf\230\0\0\0\0\34\0\0\0\4\0\0\0", 16) = 16
[pid 6668] write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[pid 6668] futex(0x1c9d5b0, FUTEX_WAIT_PRIVATE, 3, {tv_sec=0, tv_nsec=499998621} <unfinished ...>
[pid 6642] <... read resumed> "", 28) = 0
[pid 6642] futex(0x1c9d5b0, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 6668] <... futex resumed> ) = 0
[pid 6642] write(2, "Error: Connection dropped: Inapp"..., 58Error: Connection dropped: Inappropriate ioctl for device
<unfinished ...>
[pid 6668] futex(0x1c9d090, FUTEX_WAIT_PRIVATE, 0, {tv_sec=14, tv_nsec=999998934} <unfinished ...>
[pid 6642] <... write resumed> ) = 58
[pid 6642] write(2, "Exiting.\n", 9Exiting.
) = 9
[pid 6642] exit_group(1) = ?
[pid 6668] <... futex resumed>) = ?
[pid 6668] +++ exited with 1 +++
+++ exited with 1 +++
Oh I see, the read(2)
call returns 0 (EOF) so it must be the client which is closing the connection for some reason. On the client side I see:
block nbd0: Connection timed out
[ 54.756467] block nbd0: shutting down sockets
[ 54.759558] print_req_error: I/O error, dev nbd0, sector 12582896
[ 54.763358] print_req_error: I/O error, dev nbd0, sector 12582896
[ 54.767949] Buffer I/O error on dev nbd0, logical block 1572862, async page read
Note there is still a server bug here: The server shouldn't use %m
in the EOF case because in that case read(2)
does not set errno.
The client error is probably https://github.com/NetworkBlockDevice/nbd/issues/71. I will try with a newer nbd-client. I've changed the bug summary to reflect the server-side bug.
About 25% of the time when I connect to nbd-server from the nbd kernel client running on another host, nbd-server dies at connection time with:
I collected a stack trace which is not very interesting, it just says that one of the underlying system calls is failing with ENOTTY.
I'm just serving a regular local file:
The server is current nbd from git, running as non-root on a recent Linux kernel.
The client is Linux kernel nbd module 4.18.5-300.fc29.x86_64 with nbd-client 3.17.