Closed bcmills closed 1 year ago
Fallocate is newly enabled on FreeBSD at CL https://golang.org/cl/478035 . cc @tklauser
ZFS on FreeBSD does not support posix_fallocate
. The syscall should return EINVAL
according to man posix_fallocate(2):
... or the operation is not supported by the file system.
The reason for removal of posix_allocate
is also explained here: https://svnweb.freebsd.org/base?view=revision&revision=325320.
All the other builders use UFS2 AFIK, if freebsd-riscv64-unmatched is indeed using ZFS, then we're not handling the error properly.
The test checks for syscall.EOPNOTSUPP
:
https://github.com/golang/go/blob/master/src/cmd/link/internal/ld/fallocate_test.go#L29-L31
Maybe we should test for errNoFallocate
instead, like the one outbuf_nofallocate.go is returning (plus an explicit EINVAL
check in outbuf_freebsd.go converting it to errNoFallocate
)?
I'm also seeing it on my builder which was down for a week. It is a slightly different failure mode (the filesystem is UFS1 in my case):
--- FAIL: TestFallocate (0.00s)
fallocate_test.go:60: unexpected disk usage: got 128 blocks, want at least 2048
fallocate_test.go:60: unexpected disk usage: got 192 blocks, want at least 4096
fallocate_test.go:60: unexpected disk usage: got 256 blocks, want at least 6144
FAIL
FAIL cmd/link/internal/ld 45.338s
FAIL
The syscall should return
EINVAL
That's unfortunate, but common enough that testenv.SyscallIsNotSupported
checks for it explicitly:
https://cs.opensource.google/go/go/+/master:src/internal/testenv/testenv_unix.go;l=31-34;drc=75c2e97c3c11050847fc221480e1939f29c911e5
So probably the test could use that if the system call actually returned EINVAL
as documented.
Unfortunately, the failure mode we're seeing here is not consistent with an error return from the syscall. There is an explicit error check before the unexpected disk usage
check:
https://github.com/golang/go/blob/8edcdddb23c6d3f786b465c43b49e8d9a0015082/src/cmd/link/internal/ld/fallocate_test.go#L35-L37
I think this is a platform-independent bug in the test (CC @golang/runtime). The size being passed to fallocate
is only 1 KiB:
https://github.com/golang/go/blob/8edcdddb23c6d3f786b465c43b49e8d9a0015082/src/cmd/link/internal/ld/fallocate_test.go#L28
Then the sizes passed to mmap
are 1–3 MiB and the test is assuming that the number of reserved blocks is increased accordingly:
https://github.com/golang/go/blob/8edcdddb23c6d3f786b465c43b49e8d9a0015082/src/cmd/link/internal/ld/fallocate_test.go#L43
https://github.com/golang/go/blob/8edcdddb23c6d3f786b465c43b49e8d9a0015082/src/cmd/link/internal/ld/fallocate_test.go#L59
But without subsequent calls to fallocate
, it's not at all clear to me why we should expect the filesystem to have allocated more blocks after the mmap
, if the filesystem may use copy-on-write blocks.
out.Mmap
does fallocate
https://cs.opensource.google/go/go/+/master:src/cmd/link/internal/ld/outbuf_mmap.go;l=23
(out.Mmap
is not just a plain syscall)
Hmm, looks like (*OutBuf).Mmap
is missing casses for EOPNOTSUPP
and EINVAL
, at least. 🤔
https://cs.opensource.google/go/go/+/master:src/cmd/link/internal/ld/outbuf_mmap.go;l=31;drc=058f019e45fc1f18842e0339d7d56d804936263b
But that still wouldn't produce the observed failure mode in the test.
Last time I looked, on all relevant platforms ENOTSUP
and EOPNOTSUPP
are equal. But I guess it doesn't hurt to include both. EINVAL
is arguably weird -- I'd hope it fails if we actually pass a wrong argument to fallocate
. But given this issue, maybe it is fine. (Why FreeBSD/ZFS doesn't return ENOTSUP
or EOPNOTSUPP
? ...)
Attached the truss syscall trace from a tmpfs
flesystem as used on the main FreeBSD 13.1 amd64/i386 builders:
70343: 0.007439040 posix_fallocate(0x3,0x0,0x400) = 0 (0x0)
70343: 0.007685140 nanosleep({ 0.000020000 }) = 0 (0x0)
70343: 0.007693940 posix_fallocate(0x3,0x0,0x100000) = 0 (0x0)
70343: 0.007722300 ftruncate(3,0x100000) = 0 (0x0)
70343: 0.007755530 nanosleep({ 0.000020000 }) = 0 (0x0)
70343: 0.007763790 mmap(0x0,1048576,PROT_READ|PROT_WRITE,MAP_SHARED,3,0x0) = 35785814016 (0x855003000)
70343: 0.007798940 fstatat(AT_FDCWD,"/tmp/TestFallocate3468667880/001/a.out",{ mode=-rwxr-xr-x ,inode=36604,size=1048576,blksize=4096 },0x0) = 0 (0x0)
70343: 0.007809760 nanosleep({ 0.000020000 }) = 0 (0x0)
70343: 0.007844200 munmap(0x855003000,1048576) = 0 (0x0)
70343: 0.007852720 nanosleep({ 0.000020000 }) = 0 (0x0)
70343: 0.008228670 nanosleep({ 0.000020000 }) = 0 (0x0)
70343: 0.008237430 posix_fallocate(0x3,0x0,0x200000) = 0 (0x0)
70343: 0.008264920 ftruncate(3,0x200000) = 0 (0x0)
70343: 0.008292320 nanosleep({ 0.000020000 }) = 0 (0x0)
70343: 0.008300700 mmap(0x0,2097152,PROT_READ|PROT_WRITE,MAP_SHARED,3,0x0) = 35762733056 (0x853a00000)
70343: 0.008330000 fstatat(AT_FDCWD,"/tmp/TestFallocate3468667880/001/a.out",{ mode=-rwxr-xr-x ,inode=36604,size=2097152,blksize=4096 },0x0) = 0 (0x0)
70343: 0.008355660 nanosleep({ 0.000020000 }) = 0 (0x0)
70343: 0.008363740 munmap(0x853a00000,2097152) = 0 (0x0)
70343: 0.008897930 nanosleep({ 0.000020000 }) = 0 (0x0)
70343: 0.008906620 posix_fallocate(0x3,0x0,0x300000) = 0 (0x0)
70343: 0.008933030 ftruncate(3,0x300000) = 0 (0x0)
70343: 0.008959760 nanosleep({ 0.000020000 }) = 0 (0x0)
70343: 0.008968280 mmap(0x0,3145728,PROT_READ|PROT_WRITE,MAP_SHARED,3,0x0) = 35762733056 (0x853a00000)
70343: 0.008999030 fstatat(AT_FDCWD,"/tmp/TestFallocate3468667880/001/a.out",{ mode=-rwxr-xr-x ,inode=36604,size=3145728,blksize=4096 },0x0) = 0 (0x0)
70343: 0.009024660 nanosleep({ 0.000020000 }) = 0 (0x0)
70343: 0.009033260 munmap(0x853a00000,3145728) = 0 (0x0)
A ZFS directory with lz4 compression enabled reproduces the test failure with a single block:
=== RUN TestFallocate
fallocate_test.go:60: unexpected disk usage: got 1 blocks, want at least 2048
fallocate_test.go:60: unexpected disk usage: got 1 blocks, want at least 4096
fallocate_test.go:60: unexpected disk usage: got 1 blocks, want at least 6144
--- FAIL: TestFallocate (0.00s)
FAIL
truss log (posix_fallocate
returning EINVAL
but ignored?):
75784: 0.007915430 posix_fallocate(0x3,0x0,0x400) ERR#22 'Invalid argument'
75784: 0.007961290 posix_fallocate(0x3,0x0,0x100000) ERR#22 'Invalid argument'
75784: 0.007987920 nanosleep({ 0.000020000 }) = 0 (0x0)
75784: 0.007995870 ftruncate(3,0x100000) = 0 (0x0)
75784: 0.008030060 mmap(0x0,1048576,PROT_READ|PROT_WRITE,MAP_SHARED,3,0x0) = 35807641600 (0x8564d4000)
75784: 0.008040770 nanosleep({ 0.000020000 }) = 0 (0x0)
75784: 0.008076570 fstatat(AT_FDCWD,"/home/yuval/Data/tmp/TestFallocate863409915/001/a.out",{ mode=-rwxr-xr-x ,inode=721019,size=1048576,blksize
=1048576 },0x0) = 0 (0x0)
75784: 0.008088000 nanosleep({ 0.000020000 }) = 0 (0x0)
75784: 0.008141090 nanosleep({ 0.000020000 }) = 0 (0x0)
75784: 0.008149300 write(1," fallocate_test.go:60: unexpe"...,82) = 82 (0x52)
75784: 0.008177910 munmap(0x8564d4000,1048576) = 0 (0x0)
75784: 0.008196210 posix_fallocate(0x3,0x0,0x200000) ERR#22 'Invalid argument'
75784: 0.008205620 nanosleep({ 0.000020000 }) = 0 (0x0)
75784: 0.008235760 ftruncate(3,0x200000) = 0 (0x0)
75784: 0.008256360 mmap(0x0,2097152,PROT_READ|PROT_WRITE,MAP_SHARED,3,0x0) = 35796287488 (0x855a00000)
75784: 0.008266100 nanosleep({ 0.000020000 }) = 0 (0x0)
75784: 0.008302200 fstatat(AT_FDCWD,"/home/yuval/Data/tmp/TestFallocate863409915/001/a.out",{ mode=-rwxr-xr-x ,inode=721019,size=2097152,blksize
=1048576 },0x0) = 0 (0x0)
75784: 0.008331090 nanosleep({ 0.000020000 }) = 0 (0x0)
75784: 0.008339130 write(1," fallocate_test.go:60: unexpe"...,82) = 82 (0x52)
75784: 0.008366400 munmap(0x855a00000,2097152) = 0 (0x0)
75784: 0.008382670 posix_fallocate(0x3,0x0,0x300000) ERR#22 'Invalid argument'
75784: 0.008402730 ftruncate(3,0x300000) = 0 (0x0)
75784: 0.008411540 nanosleep({ 0.000040000 }) = 0 (0x0)
75784: 0.008440800 mmap(0x0,3145728,PROT_READ|PROT_WRITE,MAP_SHARED,3,0x0) = 35815161856 (0x856c00000)
75784: 0.008461170 fstatat(AT_FDCWD,"/home/yuval/Data/tmp/TestFallocate863409915/001/a.out",{ mode=-rwxr-xr-x ,inode=721019,size=3145728,blksize=1048576 },0x0) = 0 (0x0)
75784: 0.008487470 write(1," fallocate_test.go:60: unexpe"...,82) = 82 (0x52)
75784: 0.008504550 munmap(0x856c00000,3145728) = 0 (0x0)
I don't see any calls to msync
, so I think this works on a tmpfs
filesystem just by coincidence as a MAP_SHARED
file-backed tmpfs
is actually anonymous memory.
@paulzhol, is it possible that internal/syscall/unix.PosixFallocate
is using the wrong syscall ABI for the error return? Otherwise, I don't see how it's possible for the ERR#22
from posix_fallocate
to be ignored.
(The initial call to posix_fallocate
with argument 0x400
is the direct call to out.fallocate
within TestFallocate
, and its return value is checked explicitly.)
@bcmills internal/syscall/unix.PosixFallocate
looks fine to me. But weirdly enough, truss output from the arm builder also returns EINVAL
for posix_allocate on a UFS1 filesystem which has fallocate:
2730: 0.072921729 posix_fallocate(0x3,0x0,0x0,0x400,0x0,0x0) ERR#22 'Invalid argument'
2730: 0.073163969 posix_fallocate(0x3,0x0,0x0,0x100000,0x0,0x0) ERR#22 'Invalid argument'
2730: 0.073726784 ftruncate(3,0x10000000000000) = 0 (0x0)
2730: 0.075205955 mmap(0x0,1048576,PROT_READ|PROT_WRITE,MAP_SHARED,3,0x0) = 682622976 (0x28b00000)
2730: 0.076058405 fstatat(AT_FDCWD,"/tmp/TestFallocate3866123553/001/a.out",{ mode=-rwxr-xr-x ,inode=36238,size=1048576,blksize=32768 },0x0) =
0 (0x0)
2730: 0.076769501 write(1," fallocate_test.go:60: unexpe"...,84) = 84 (0x54)
2730: 0.077020856 munmap(0x28b00000,1048576) = 0 (0x0)
2730: 0.077247732 posix_fallocate(0x3,0x0,0x0,0x200000,0x0,0x0) ERR#22 'Invalid argument'
2730: 0.077727160 ftruncate(3,0x20000000000000) = 0 (0x0)
2730: 0.078035234 mmap(0x0,2097152,PROT_READ|PROT_WRITE,MAP_SHARED,3,0x0) = 1220542464 (0x48c00000)
2730: 0.078363100 fstatat(AT_FDCWD,"/tmp/TestFallocate3866123553/001/a.out",{ mode=-rwxr-xr-x ,inode=36238,size=2097152,blksize=32768 },0x0) =
0 (0x0)
2730: 0.078878674 write(1," fallocate_test.go:60: unexpe"...,84) = 84 (0x54)
2730: 0.079122998 munmap(0x48c00000,2097152) = 0 (0x0)
2730: 0.079356228 posix_fallocate(0x3,0x0,0x0,0x300000,0x0,0x0) ERR#22 'Invalid argument'
2730: 0.079831698 ftruncate(3,0x30000000000000) = 0 (0x0)
2730: 0.080122792 mmap(0x0,3145728,PROT_READ|PROT_WRITE,MAP_SHARED,3,0x0) = 1220542464 (0x48c00000)
2730: 0.080435606 fstatat(AT_FDCWD,"/tmp/TestFallocate3866123553/001/a.out",{ mode=-rwxr-xr-x ,inode=36238,size=3145728,blksize=32768 },0x0) =
0 (0x0)
2730: 0.081043212 write(1," fallocate_test.go:60: unexpe"...,84) = 84 (0x54)
2730: 0.081286025 munmap(0x48c00000,3145728) = 0 (0x0)
2730: 0.081493526 close(3) = 0 (0x0)
I wrote a small C program and in the trace it looks completely broken, but no EINVAL:
posix_fallocate(0x3,0x400,0x0,0x0,0x400,0x0) = 0 (0x0)
vs our Go
posix_fallocate(0x3,0x0,0x0,0x400,0x0,0x0) ERR#22 'Invalid argument'
Looks like the C and Go programs pass arguments in different order? What arguments do your C program pass?
#include <unistd.h>
#include <fcntl.h>
int main() {
int fd = open("file.dat", O_RDWR|O_CREAT, 0644);
posix_fallocate(fd, 0, 1024);
posix_fallocate(fd, 0, 1<<20);
close(fd);
}
amd64:
openat(AT_FDCWD,"file.dat",O_RDWR|O_CREAT,0644) = 3 (0x3)
posix_fallocate(0x3,0x0,0x400) = 0 (0x0)
posix_fallocate(0x3,0x0,0x100000) = 0 (0x0)
close(3) = 0 (0x0)
i386:
openat(AT_FDCWD,"file.dat",O_RDWR|O_CREAT,0644) = 3 (0x3)
freebsd32_posix_fallocate(0x3,0x0,0x0,0x400,0x0) = 0 (0x0)
freebsd32_posix_fallocate(0x3,0x0,0x0,0x100000,0x0) = 0 (0x0)
close(3) = 0 (0x0)
armv7:
openat(AT_FDCWD,"file.dat",O_RDWR|O_CREAT,0644) = 3 (0x3)
posix_fallocate(0x3,0x400,0x0,0x0,0x400,0x0) = 0 (0x0)
posix_fallocate(0x3,0x100000,0x0,0x0,0x100000,0x0) = 0 (0x0)
close(3) = 0 (0x0)
The i386 looks correct, but on arm it is as if the low qword of the len is passed as the second argument. I could be just a truss bug though.
Change https://go.dev/cl/481621 mentions this issue: internal/syscall/unix: fix PosixFallocate syscall argument order on freebsd/386
Something still doesn't make sense. The freebsd-riscv64-unmatched builder exhibits the ZFS block size test failure, but the system call there presumably returns EINVAL (as seen when running the test manually on an amd64 with a ZFS filesystem).
And for Go to match the C truss output, we'd need to do
syscall.Syscall6(posixFallocateTrap, uintptr(fd), uintptr(size>>32), uintptr(off), uintptr(off>>32), uintptr(size), uintptr(size>>32))
https://man.freebsd.org/cgi/man.cgi?query=posix_fallocate&sektion=2&n=1 says:
If successful,
posix_fallocate()
returns zero. It returns an error on failure, without setting errno.
So I think the syscall wrapper needs to be checking the r1
result from the call for the error, rather than the err
result.
And for Go to match the C truss output, we'd need to do syscall.Syscall6(posixFallocateTrap, uintptr(fd), uintptr(size>>32), uintptr(off), uintptr(off>>32), uintptr(size), uintptr(size>>32))
@tklauser there's a comment in about alignment in syscall/mksyscall.pl
:
https://github.com/golang/go/blob/f62c9701b4bc61da6a5f4db8ef81d816f112430e/src/syscall/mksyscall.pl#L216-L227
Is that where the 0
in your new CL coming from?
And then truss or clang just use the actual value as an optimization?
So I think the syscall wrapper needs to be checking the
r1
result from the call for the error, rather than theerr
result.
@bcmills you're right! We can all blame OpenGroup for this :/
https://man.freebsd.org/cgi/man.cgi?query=posix_fallocate&sektion=2&n=1 says:
If successful,
posix_fallocate()
returns zero. It returns an error on failure, without setting errno.So I think the syscall wrapper needs to be checking the
r1
result from the call for the error, rather than theerr
result.
@bcmills Right, I'll adjust the CL.
And for Go to match the C truss output, we'd need to do syscall.Syscall6(posixFallocateTrap, uintptr(fd), uintptr(size>>32), uintptr(off), uintptr(off>>32), uintptr(size), uintptr(size>>32))
@tklauser there's a comment in about alignment in
syscall/mksyscall.pl
:Is that where the
0
in your new CL coming from?
Yes, I think so. Though I was using mksyscall.go
from x/sys/unix
, but that has the same logic:
https://cs.opensource.google/go/x/sys/+/master:unix/mksyscall.go;l=237-247
And then truss or clang just use the actual value as an optimization?
I'm not sure how truss and and clang implement this :(
https://build.golang.org/log/81f8a8d0ec2511d322c9e2f8d9ad3b170fc33866:
Looks like the same failure mode as #42005, but I'm filing a separate issue because this may be resolved on this particular builder by either reconfiguring the builder or adding a test skip.
(attn @mengzhuo; CC @golang/freebsd)