golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.73k stars 17.62k forks source link

syscall: Sendfile needs documentation #64044

Open bcmills opened 11 months ago

bcmills commented 11 months ago

As of Go 1.21, the syscall.Sendfile function has no documentation.

For many functions in the syscall package, we assume POSIX semantics in the absence of explicit documentation. However, sendfile is not defined by POSIX, and its semantics vary significantly among platforms.

Notably:

On Linux, “sendfile() will transfer at most 0x7ffff000 (2,147,479,552) bytes, returning the number of bytes actually transferred”. FreeBSD, macOS, and Solaris do not document any such restriction.

The reporting of the actual number of bytes transferred varies by platform.

It appears that the return-value from Go's syscall.Sendfile on FreeBSD and macOS always reports the *sbytes (a.k.a len) out-parameter, which is always nonnegative. On Linux and Solaris, it reports the return value from the call, which is -1 on error.

The effect on the offset of the input file varies by platform.

The allowed output descriptors vary by platform.

In addition, on Solaris and Illumos it appears that EAGAIN can be returned for reasons other than full send buffers — it can also occur due to file or record locking on the input or output file.

Given these variations, it seems to me that the semantics and usage of the Go syscall wrapper should be documented — especially given that the signature of Go's syscall.Sendfile on FreeBSD and macOS doesn't match the signature of the corresponding system C function.

References:

bcmills commented 11 months ago

(CC @panjf2000)

paulzhol commented 11 months ago

FreeBSD does not document whether the file offset of fd is modified by the call. (I'm guessing that it's not, though.)

I also don't think it modifies fd. I didn't catch any fo_seek() calls in vn_sendfile() however linux_sendfile_common() does them. It is part of the Linuxulator (Linux Emulation) / Linux binary compatibility. That code also carries the following comment:

Differences between FreeBSD and Linux sendfile: ```C /* * Differences between FreeBSD and Linux sendfile: * - Linux doesn't send anything when count is 0 (FreeBSD uses 0 to * mean send the whole file.) In linux_sendfile given fds are still * checked for validity when the count is 0. * - Linux can send to any fd whereas FreeBSD only supports sockets. * The same restriction follows for linux_sendfile. * - Linux doesn't have an equivalent for FreeBSD's flags and sf_hdtr. * - Linux takes an offset pointer and updates it to the read location. * FreeBSD takes in an offset and a 'bytes read' parameter which is * only filled if it isn't NULL. We use this parameter to update the * offset pointer if it exists. * - Linux sendfile returns bytes read on success while FreeBSD * returns 0. We use the 'bytes read' parameter to get this value. */ ```
panjf2000 commented 11 months ago

Thank you for bringing this up. @bcmills

As the Linux man pages stated, sendfile(2) on Linux is indeed implemented distinctively from other UNIX systems.

As for the scenario of partial write, sendfile() may send fewer bytes than requested on either EAGAIN or EINTR on BSD-like OS's while a successful yet incomplete call to sendfile on Linux would return no error because EAGAIN from sendfile should only happen in the "zero-byte sent" case, as with other read/write-like system calls.

Another implementation detail worth mentioning is that sendfile(2) on Linux uses splice(2) to fulfill the zero-copy job under the hood since kernel v2.6.23, which might help us better understand the behavior of sendfile(2).

gopherbot commented 11 months ago

Change https://go.dev/cl/546295 mentions this issue: syscall: document Sendfile with semantics and usage

gopherbot commented 11 months ago

Change https://go.dev/cl/537275 mentions this issue: internal/poll: revise the determination about [handled] and improve the code readability for SendFile