golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.12k stars 17.68k forks source link

cmd/link/internal/ld: TestFallocate consistently failing on freebsd-riscv64-unmatched #59352

Closed bcmills closed 1 year ago

bcmills commented 1 year ago

https://build.golang.org/log/81f8a8d0ec2511d322c9e2f8d9ad3b170fc33866:

--- FAIL: TestFallocate (0.00s)
    fallocate_test.go:60: unexpected disk usage: got 1 blocks, want at least 2048
    fallocate_test.go:60: unexpected disk usage: got 1 blocks, want at least 4096
    fallocate_test.go:60: unexpected disk usage: got 1 blocks, want at least 6144
FAIL
FAIL    cmd/link/internal/ld    45.438s

Looks like the same failure mode as #42005, but I'm filing a separate issue because this may be resolved on this particular builder by either reconfiguring the builder or adding a test skip.

(attn @mengzhuo; CC @golang/freebsd)

cherrymui commented 1 year ago

Fallocate is newly enabled on FreeBSD at CL https://golang.org/cl/478035 . cc @tklauser

paulzhol commented 1 year ago

ZFS on FreeBSD does not support posix_fallocate. The syscall should return EINVAL according to man posix_fallocate(2):

... or the operation is not supported by the file system.

The reason for removal of posix_allocate is also explained here: https://svnweb.freebsd.org/base?view=revision&revision=325320.

All the other builders use UFS2 AFIK, if freebsd-riscv64-unmatched is indeed using ZFS, then we're not handling the error properly. The test checks for syscall.EOPNOTSUPP: https://github.com/golang/go/blob/master/src/cmd/link/internal/ld/fallocate_test.go#L29-L31

Maybe we should test for errNoFallocate instead, like the one outbuf_nofallocate.go is returning (plus an explicit EINVAL check in outbuf_freebsd.go converting it to errNoFallocate)?

paulzhol commented 1 year ago

I'm also seeing it on my builder which was down for a week. It is a slightly different failure mode (the filesystem is UFS1 in my case):

--- FAIL: TestFallocate (0.00s)
    fallocate_test.go:60: unexpected disk usage: got 128 blocks, want at least 2048
    fallocate_test.go:60: unexpected disk usage: got 192 blocks, want at least 4096
    fallocate_test.go:60: unexpected disk usage: got 256 blocks, want at least 6144
FAIL
FAIL    cmd/link/internal/ld    45.338s
FAIL
bcmills commented 1 year ago

The syscall should return EINVAL

That's unfortunate, but common enough that testenv.SyscallIsNotSupported checks for it explicitly: https://cs.opensource.google/go/go/+/master:src/internal/testenv/testenv_unix.go;l=31-34;drc=75c2e97c3c11050847fc221480e1939f29c911e5

So probably the test could use that if the system call actually returned EINVAL as documented.

Unfortunately, the failure mode we're seeing here is not consistent with an error return from the syscall. There is an explicit error check before the unexpected disk usage check: https://github.com/golang/go/blob/8edcdddb23c6d3f786b465c43b49e8d9a0015082/src/cmd/link/internal/ld/fallocate_test.go#L35-L37

bcmills commented 1 year ago

I think this is a platform-independent bug in the test (CC @golang/runtime). The size being passed to fallocate is only 1 KiB: https://github.com/golang/go/blob/8edcdddb23c6d3f786b465c43b49e8d9a0015082/src/cmd/link/internal/ld/fallocate_test.go#L28

Then the sizes passed to mmap are 1–3 MiB and the test is assuming that the number of reserved blocks is increased accordingly: https://github.com/golang/go/blob/8edcdddb23c6d3f786b465c43b49e8d9a0015082/src/cmd/link/internal/ld/fallocate_test.go#L43 https://github.com/golang/go/blob/8edcdddb23c6d3f786b465c43b49e8d9a0015082/src/cmd/link/internal/ld/fallocate_test.go#L59

But without subsequent calls to fallocate, it's not at all clear to me why we should expect the filesystem to have allocated more blocks after the mmap, if the filesystem may use copy-on-write blocks.

cherrymui commented 1 year ago

out.Mmap does fallocate https://cs.opensource.google/go/go/+/master:src/cmd/link/internal/ld/outbuf_mmap.go;l=23 (out.Mmap is not just a plain syscall)

bcmills commented 1 year ago

Hmm, looks like (*OutBuf).Mmap is missing casses for EOPNOTSUPP and EINVAL, at least. 🤔 https://cs.opensource.google/go/go/+/master:src/cmd/link/internal/ld/outbuf_mmap.go;l=31;drc=058f019e45fc1f18842e0339d7d56d804936263b

But that still wouldn't produce the observed failure mode in the test.

cherrymui commented 1 year ago

Last time I looked, on all relevant platforms ENOTSUP and EOPNOTSUPP are equal. But I guess it doesn't hurt to include both. EINVAL is arguably weird -- I'd hope it fails if we actually pass a wrong argument to fallocate. But given this issue, maybe it is fine. (Why FreeBSD/ZFS doesn't return ENOTSUP or EOPNOTSUPP? ...)

paulzhol commented 1 year ago

Attached the truss syscall trace from a tmpfs flesystem as used on the main FreeBSD 13.1 amd64/i386 builders:

70343: 0.007439040 posix_fallocate(0x3,0x0,0x400) = 0 (0x0)
70343: 0.007685140 nanosleep({ 0.000020000 })    = 0 (0x0)
70343: 0.007693940 posix_fallocate(0x3,0x0,0x100000) = 0 (0x0)
70343: 0.007722300 ftruncate(3,0x100000)         = 0 (0x0)
70343: 0.007755530 nanosleep({ 0.000020000 })    = 0 (0x0)
70343: 0.007763790 mmap(0x0,1048576,PROT_READ|PROT_WRITE,MAP_SHARED,3,0x0) = 35785814016 (0x855003000)
70343: 0.007798940 fstatat(AT_FDCWD,"/tmp/TestFallocate3468667880/001/a.out",{ mode=-rwxr-xr-x ,inode=36604,size=1048576,blksize=4096 },0x0) = 0 (0x0)
70343: 0.007809760 nanosleep({ 0.000020000 })    = 0 (0x0)
70343: 0.007844200 munmap(0x855003000,1048576)   = 0 (0x0)
70343: 0.007852720 nanosleep({ 0.000020000 })    = 0 (0x0)
70343: 0.008228670 nanosleep({ 0.000020000 })    = 0 (0x0)
70343: 0.008237430 posix_fallocate(0x3,0x0,0x200000) = 0 (0x0)
70343: 0.008264920 ftruncate(3,0x200000)         = 0 (0x0)
70343: 0.008292320 nanosleep({ 0.000020000 })    = 0 (0x0)
70343: 0.008300700 mmap(0x0,2097152,PROT_READ|PROT_WRITE,MAP_SHARED,3,0x0) = 35762733056 (0x853a00000)
70343: 0.008330000 fstatat(AT_FDCWD,"/tmp/TestFallocate3468667880/001/a.out",{ mode=-rwxr-xr-x ,inode=36604,size=2097152,blksize=4096 },0x0) = 0 (0x0)
70343: 0.008355660 nanosleep({ 0.000020000 })    = 0 (0x0)
70343: 0.008363740 munmap(0x853a00000,2097152)   = 0 (0x0)
70343: 0.008897930 nanosleep({ 0.000020000 })    = 0 (0x0)
70343: 0.008906620 posix_fallocate(0x3,0x0,0x300000) = 0 (0x0)
70343: 0.008933030 ftruncate(3,0x300000)         = 0 (0x0)
70343: 0.008959760 nanosleep({ 0.000020000 })    = 0 (0x0)
70343: 0.008968280 mmap(0x0,3145728,PROT_READ|PROT_WRITE,MAP_SHARED,3,0x0) = 35762733056 (0x853a00000)
70343: 0.008999030 fstatat(AT_FDCWD,"/tmp/TestFallocate3468667880/001/a.out",{ mode=-rwxr-xr-x ,inode=36604,size=3145728,blksize=4096 },0x0) = 0 (0x0)
70343: 0.009024660 nanosleep({ 0.000020000 })    = 0 (0x0)
70343: 0.009033260 munmap(0x853a00000,3145728)   = 0 (0x0)

A ZFS directory with lz4 compression enabled reproduces the test failure with a single block:

=== RUN   TestFallocate
    fallocate_test.go:60: unexpected disk usage: got 1 blocks, want at least 2048
    fallocate_test.go:60: unexpected disk usage: got 1 blocks, want at least 4096
    fallocate_test.go:60: unexpected disk usage: got 1 blocks, want at least 6144
--- FAIL: TestFallocate (0.00s)
FAIL

truss log (posix_fallocate returning EINVAL but ignored?):

75784: 0.007915430 posix_fallocate(0x3,0x0,0x400) ERR#22 'Invalid argument'
75784: 0.007961290 posix_fallocate(0x3,0x0,0x100000) ERR#22 'Invalid argument'
75784: 0.007987920 nanosleep({ 0.000020000 })    = 0 (0x0)
75784: 0.007995870 ftruncate(3,0x100000)         = 0 (0x0)
75784: 0.008030060 mmap(0x0,1048576,PROT_READ|PROT_WRITE,MAP_SHARED,3,0x0) = 35807641600 (0x8564d4000)
75784: 0.008040770 nanosleep({ 0.000020000 })    = 0 (0x0)
75784: 0.008076570 fstatat(AT_FDCWD,"/home/yuval/Data/tmp/TestFallocate863409915/001/a.out",{ mode=-rwxr-xr-x ,inode=721019,size=1048576,blksize
=1048576 },0x0) = 0 (0x0)
75784: 0.008088000 nanosleep({ 0.000020000 })    = 0 (0x0)
75784: 0.008141090 nanosleep({ 0.000020000 })    = 0 (0x0)
75784: 0.008149300 write(1,"    fallocate_test.go:60: unexpe"...,82) = 82 (0x52)
75784: 0.008177910 munmap(0x8564d4000,1048576)   = 0 (0x0)
75784: 0.008196210 posix_fallocate(0x3,0x0,0x200000) ERR#22 'Invalid argument'
75784: 0.008205620 nanosleep({ 0.000020000 })    = 0 (0x0)
75784: 0.008235760 ftruncate(3,0x200000)         = 0 (0x0)
75784: 0.008256360 mmap(0x0,2097152,PROT_READ|PROT_WRITE,MAP_SHARED,3,0x0) = 35796287488 (0x855a00000)
75784: 0.008266100 nanosleep({ 0.000020000 })    = 0 (0x0)
75784: 0.008302200 fstatat(AT_FDCWD,"/home/yuval/Data/tmp/TestFallocate863409915/001/a.out",{ mode=-rwxr-xr-x ,inode=721019,size=2097152,blksize
=1048576 },0x0) = 0 (0x0)
75784: 0.008331090 nanosleep({ 0.000020000 })    = 0 (0x0)
75784: 0.008339130 write(1,"    fallocate_test.go:60: unexpe"...,82) = 82 (0x52)
75784: 0.008366400 munmap(0x855a00000,2097152)   = 0 (0x0)
75784: 0.008382670 posix_fallocate(0x3,0x0,0x300000) ERR#22 'Invalid argument'
75784: 0.008402730 ftruncate(3,0x300000)         = 0 (0x0)
75784: 0.008411540 nanosleep({ 0.000040000 })    = 0 (0x0)
75784: 0.008440800 mmap(0x0,3145728,PROT_READ|PROT_WRITE,MAP_SHARED,3,0x0) = 35815161856 (0x856c00000)
75784: 0.008461170 fstatat(AT_FDCWD,"/home/yuval/Data/tmp/TestFallocate863409915/001/a.out",{ mode=-rwxr-xr-x ,inode=721019,size=3145728,blksize=1048576 },0x0) = 0 (0x0)
75784: 0.008487470 write(1,"    fallocate_test.go:60: unexpe"...,82) = 82 (0x52)
75784: 0.008504550 munmap(0x856c00000,3145728)   = 0 (0x0)
paulzhol commented 1 year ago

I don't see any calls to msync, so I think this works on a tmpfs filesystem just by coincidence as a MAP_SHARED file-backed tmpfs is actually anonymous memory.

bcmills commented 1 year ago

@paulzhol, is it possible that internal/syscall/unix.PosixFallocate is using the wrong syscall ABI for the error return? Otherwise, I don't see how it's possible for the ERR#22 from posix_fallocate to be ignored.

(The initial call to posix_fallocate with argument 0x400 is the direct call to out.fallocate within TestFallocate, and its return value is checked explicitly.)

paulzhol commented 1 year ago

@bcmills internal/syscall/unix.PosixFallocate looks fine to me. But weirdly enough, truss output from the arm builder also returns EINVAL for posix_allocate on a UFS1 filesystem which has fallocate:

2730: 0.072921729 posix_fallocate(0x3,0x0,0x0,0x400,0x0,0x0) ERR#22 'Invalid argument'
 2730: 0.073163969 posix_fallocate(0x3,0x0,0x0,0x100000,0x0,0x0) ERR#22 'Invalid argument'
 2730: 0.073726784 ftruncate(3,0x10000000000000) = 0 (0x0)
 2730: 0.075205955 mmap(0x0,1048576,PROT_READ|PROT_WRITE,MAP_SHARED,3,0x0) = 682622976 (0x28b00000)
 2730: 0.076058405 fstatat(AT_FDCWD,"/tmp/TestFallocate3866123553/001/a.out",{ mode=-rwxr-xr-x ,inode=36238,size=1048576,blksize=32768 },0x0) = 
0 (0x0)
 2730: 0.076769501 write(1,"    fallocate_test.go:60: unexpe"...,84) = 84 (0x54)
 2730: 0.077020856 munmap(0x28b00000,1048576)    = 0 (0x0)
 2730: 0.077247732 posix_fallocate(0x3,0x0,0x0,0x200000,0x0,0x0) ERR#22 'Invalid argument'
 2730: 0.077727160 ftruncate(3,0x20000000000000) = 0 (0x0)
 2730: 0.078035234 mmap(0x0,2097152,PROT_READ|PROT_WRITE,MAP_SHARED,3,0x0) = 1220542464 (0x48c00000)
 2730: 0.078363100 fstatat(AT_FDCWD,"/tmp/TestFallocate3866123553/001/a.out",{ mode=-rwxr-xr-x ,inode=36238,size=2097152,blksize=32768 },0x0) = 
0 (0x0)
 2730: 0.078878674 write(1,"    fallocate_test.go:60: unexpe"...,84) = 84 (0x54)
 2730: 0.079122998 munmap(0x48c00000,2097152)    = 0 (0x0)
 2730: 0.079356228 posix_fallocate(0x3,0x0,0x0,0x300000,0x0,0x0) ERR#22 'Invalid argument'
 2730: 0.079831698 ftruncate(3,0x30000000000000) = 0 (0x0)
 2730: 0.080122792 mmap(0x0,3145728,PROT_READ|PROT_WRITE,MAP_SHARED,3,0x0) = 1220542464 (0x48c00000)
 2730: 0.080435606 fstatat(AT_FDCWD,"/tmp/TestFallocate3866123553/001/a.out",{ mode=-rwxr-xr-x ,inode=36238,size=3145728,blksize=32768 },0x0) = 
0 (0x0)
 2730: 0.081043212 write(1,"    fallocate_test.go:60: unexpe"...,84) = 84 (0x54)
 2730: 0.081286025 munmap(0x48c00000,3145728)    = 0 (0x0)
 2730: 0.081493526 close(3)                      = 0 (0x0)

I wrote a small C program and in the trace it looks completely broken, but no EINVAL:

posix_fallocate(0x3,0x400,0x0,0x0,0x400,0x0)     = 0 (0x0)

vs our Go

posix_fallocate(0x3,0x0,0x0,0x400,0x0,0x0) ERR#22 'Invalid argument'
cherrymui commented 1 year ago

Looks like the C and Go programs pass arguments in different order? What arguments do your C program pass?

paulzhol commented 1 year ago
#include <unistd.h>
#include <fcntl.h>

int main() {
    int fd = open("file.dat", O_RDWR|O_CREAT, 0644);
    posix_fallocate(fd, 0, 1024);
    posix_fallocate(fd, 0, 1<<20);
    close(fd);
}

amd64:

openat(AT_FDCWD,"file.dat",O_RDWR|O_CREAT,0644)  = 3 (0x3)
posix_fallocate(0x3,0x0,0x400)           = 0 (0x0)
posix_fallocate(0x3,0x0,0x100000)        = 0 (0x0)
close(3)                     = 0 (0x0)

i386:

openat(AT_FDCWD,"file.dat",O_RDWR|O_CREAT,0644)  = 3 (0x3)
freebsd32_posix_fallocate(0x3,0x0,0x0,0x400,0x0) = 0 (0x0)
freebsd32_posix_fallocate(0x3,0x0,0x0,0x100000,0x0) = 0 (0x0)
close(3)                     = 0 (0x0)

armv7:

openat(AT_FDCWD,"file.dat",O_RDWR|O_CREAT,0644)  = 3 (0x3)
posix_fallocate(0x3,0x400,0x0,0x0,0x400,0x0)     = 0 (0x0)
posix_fallocate(0x3,0x100000,0x0,0x0,0x100000,0x0) = 0 (0x0)
close(3)                     = 0 (0x0)

The i386 looks correct, but on arm it is as if the low qword of the len is passed as the second argument. I could be just a truss bug though.

bcmills commented 1 year ago

Oh! This is probably related to CL 479715 (attn @tklauser).

tklauser commented 1 year ago

Oh! This is probably related to CL 479715 (attn @tklauser).

Sorry about that. I wasn't able to test on arm at that time due to the builder being down and not having another freebsd/arm machine available. Sent CL 481621 with a fix.

gopherbot commented 1 year ago

Change https://go.dev/cl/481621 mentions this issue: internal/syscall/unix: fix PosixFallocate syscall argument order on freebsd/386

paulzhol commented 1 year ago

Something still doesn't make sense. The freebsd-riscv64-unmatched builder exhibits the ZFS block size test failure, but the system call there presumably returns EINVAL (as seen when running the test manually on an amd64 with a ZFS filesystem). And for Go to match the C truss output, we'd need to do syscall.Syscall6(posixFallocateTrap, uintptr(fd), uintptr(size>>32), uintptr(off), uintptr(off>>32), uintptr(size), uintptr(size>>32))

bcmills commented 1 year ago

https://man.freebsd.org/cgi/man.cgi?query=posix_fallocate&sektion=2&n=1 says:

If successful, posix_fallocate() returns zero. It returns an error on failure, without setting errno.

So I think the syscall wrapper needs to be checking the r1 result from the call for the error, rather than the err result.

paulzhol commented 1 year ago

And for Go to match the C truss output, we'd need to do syscall.Syscall6(posixFallocateTrap, uintptr(fd), uintptr(size>>32), uintptr(off), uintptr(off>>32), uintptr(size), uintptr(size>>32))

@tklauser there's a comment in about alignment in syscall/mksyscall.pl: https://github.com/golang/go/blob/f62c9701b4bc61da6a5f4db8ef81d816f112430e/src/syscall/mksyscall.pl#L216-L227 Is that where the 0 in your new CL coming from? And then truss or clang just use the actual value as an optimization?

paulzhol commented 1 year ago

So I think the syscall wrapper needs to be checking the r1 result from the call for the error, rather than the err result.

@bcmills you're right! We can all blame OpenGroup for this :/

tklauser commented 1 year ago

https://man.freebsd.org/cgi/man.cgi?query=posix_fallocate&sektion=2&n=1 says:

If successful, posix_fallocate() returns zero. It returns an error on failure, without setting errno.

So I think the syscall wrapper needs to be checking the r1 result from the call for the error, rather than the err result.

@bcmills Right, I'll adjust the CL.

And for Go to match the C truss output, we'd need to do syscall.Syscall6(posixFallocateTrap, uintptr(fd), uintptr(size>>32), uintptr(off), uintptr(off>>32), uintptr(size), uintptr(size>>32))

@tklauser there's a comment in about alignment in syscall/mksyscall.pl:

https://github.com/golang/go/blob/f62c9701b4bc61da6a5f4db8ef81d816f112430e/src/syscall/mksyscall.pl#L216-L227

Is that where the 0 in your new CL coming from?

Yes, I think so. Though I was using mksyscall.go from x/sys/unix, but that has the same logic:

https://cs.opensource.google/go/x/sys/+/master:unix/mksyscall.go;l=237-247

And then truss or clang just use the actual value as an optimization?

I'm not sure how truss and and clang implement this :(