Open bcmills opened 11 months ago
(CC @panjf2000)
FreeBSD does not document whether the file offset of fd is modified by the call. (I'm guessing that it's not, though.)
I also don't think it modifies fd
. I didn't catch any fo_seek()
calls in vn_sendfile() however linux_sendfile_common() does them.
It is part of the Linuxulator (Linux Emulation) / Linux binary compatibility. That code also carries the following comment:
Thank you for bringing this up. @bcmills
As the Linux man pages stated, sendfile(2)
on Linux is indeed implemented distinctively from other UNIX systems.
As for the scenario of partial write, sendfile()
may send fewer bytes than requested on either EAGAIN
or EINTR
on BSD-like OS's while a successful yet incomplete call to sendfile
on Linux would return no error because EAGAIN
from sendfile
should only happen in the "zero-byte sent" case, as with other read/write
-like system calls.
Another implementation detail worth mentioning is that sendfile(2)
on Linux uses splice(2)
to fulfill the zero-copy job under the hood since kernel v2.6.23, which might help us better understand the behavior of sendfile(2)
.
Change https://go.dev/cl/546295 mentions this issue: syscall: document Sendfile with semantics and usage
Change https://go.dev/cl/537275 mentions this issue: internal/poll: revise the determination about [handled] and improve the code readability for SendFile
As of Go 1.21, the
syscall.Sendfile
function has no documentation.For many functions in the
syscall
package, we assume POSIX semantics in the absence of explicit documentation. However,sendfile
is not defined by POSIX, and its semantics vary significantly among platforms.Notably:
On Linux, “
sendfile()
will transfer at most 0x7ffff000 (2,147,479,552) bytes, returning the number of bytes actually transferred”. FreeBSD, macOS, and Solaris do not document any such restriction.The reporting of the actual number of bytes transferred varies by platform.
sendfile()
may send fewer bytes than requested” only “[w]hen using a socket marked for non-blocking I/O”. In that case, it sets thesbytes
out-parameter to indicate then number of bytes written, returns -1, and sets errno toEAGAIN
.sendfile()
may write fewer bytes than requested”, but does not specify what happens to theoffset
parameter or the input file's offset on error.sendfile()
may still write some data before encountering an error and returning-1
. When that occurs,off
is updated to point to the byte that follows the last byte copied and should be compared with its value before callingsendfile()
to determine how much data was sent.”It appears that the return-value from Go's
syscall.Sendfile
on FreeBSD and macOS always reports the*sbytes
(a.k.alen
) out-parameter, which is always nonnegative. On Linux and Solaris, it reports the return value from the call, which is -1 on error.The effect on the offset of the input file varies by platform.
offset
parameter is null, “data will be read fromin_fd
starting at the file offset, and the file offset will be updated by the call.”sendfile()
function does not modify the current file pointer ofin_fd
, but does modify the file pointer forout_fd
if it is a regular file.” It does not document any particular behavior if theoff
argument is null, but its error behavior seems to imply than a non-null offset pointer should always be used.fd
is modified by the call. (I'm guessing that it's not, though.)The allowed output descriptors vary by platform.
AF_INET
orAF_INET6
socket ofSOCK_STREAM
type”.In addition, on Solaris and Illumos it appears that
EAGAIN
can be returned for reasons other than full send buffers — it can also occur due to file or record locking on the input or output file.Given these variations, it seems to me that the semantics and usage of the Go syscall wrapper should be documented — especially given that the signature of Go's
syscall.Sendfile
on FreeBSD and macOS doesn't match the signature of the corresponding system C function.References: