Closed neild closed 7 months ago
I found the following survey of keep-alive behaviors to be useful: https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
That post also covers the TCP_USER_TIMEOUT
socket option, which configures another aspect of how dead TCP connections are detected. I'm not certain if anything other than Linux supports that option, so I've left it out of this proposal.
Presumably SetKeepAliveConfig
would return a non-nil error if the configured keepalive behavior can't be provided on the current platform?
Presumably SetKeepAliveConfig would return a non-nil error if the configured keepalive behavior can't be provided on the current platform?
That was my thought. If a caller doesn't mind a platform not supporting an option, they can just not check the returned error.
Presumably
SetKeepAliveConfig
would return a non-nil error if the configured keepalive behavior can't be provided on the current platform?
Maybe we should return an error that wraps errors. ErrUnsupported
from #41198
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group
This seems reasonable, well-motivated, and well-justified. Thank you. I think you can drop the "and KeepAliveConfig.Enable is false" in the KeepAlive docs, since it already said the field is ignored when Enable==true.
It would help to say what the current defaults are in the config struct docs.
And in Count maybe "should be sent" should be "can go unanswered".
Updated proposal with the suggested changes.
I needed to pick default values for KeepAliveConfig fields. I went with the current behavior for Idle and Interval (15 seconds), and the Linux kernel default for Count (9). We could also say that the default for Count matches the current behavior of picking the OS default, but it seems less surprising for us to set all three.
package net
// KeepAliveConfig contains TCP keep-alive options.
//
// If the Idle, Interval, or Count fields are zero, a default value is chosen.
// If a field is negative, the operating system default is used.
type KeepAliveConfig struct {
// If Enable is true, keep-alive probes are enabled.
Enable bool
// Idle is the time that the connection must be idle before
// the first keep-alive probe is sent.
// If zero, a default value of 15 seconds is used.
Idle time.Duration
// Interval is the time between keep-alive probes.
// If zero, a default value of 15 seconds is used.
Interval time.Duration
// Count is the maximum number of keep-alive probes that
// can go unanswered before dropping a connection.
// If zero, a default value of 9 is used.
Count int
}
type Dialer { // contains existing unchanged fields
// KeepAlive specifies the interval between keep-alive
// probes for an active network connection.
//
// KeepAlive is ignored if KeepAliveConfig.Enable is true.
//
// If zero, keep-alive probes are sent with a default value
// (currently 15 seconds), if supported by the protocol and operating
// system. Network protocols or operating systems that do
// not support keep-alives ignore this field.
// If negative, keep-alive probes are disabled.
KeepAlive time.Duration
// KeepAliveConfig specifies the keep-alive probe configuration
// for an active network connection, when supported by the
// protocol and operating system.
//
// If KeepAliveConfig.Enable is true, keep-alives probes are enabled.
// If KeepAliveConfig.Enable is false and KeepAlive is negative,
// keep-alive probes are disabled.
KeepAliveConfig KeepAliveConfig
}
type ListenConfig {
// same fields as Dialer above
}
// SetKeepAliveConfig configures keep-alive messages sent by the operating system.
func (c *TCPConn) SetKeepAliveConfig(config KeepAliveConfig) error {}
Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group
Proposal details are in https://github.com/golang/go/issues/62254#issuecomment-1791102281.
No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group
Proposal details are in https://github.com/golang/go/issues/62254#issuecomment-1791102281.
I've done some investigations about TCP keep-alive on various platforms and had a CL for this proposal drafted, I'd volunteer to implement this if you don't mind.
Change https://go.dev/cl/542275 mentions this issue: net: add KeepAliveConfig and implement SetKeepAliveConfig
Thanks everyone for the discourse! Looks like it is too late in the cycle to get in, forward rolling to Go1.23.
Thanks everyone for the discourse! Looks like it is too late in the cycle to get in, forward rolling to Go1.23.
Actually, CL 542275 had gained all votes and been ready to be submitted the day before Go 1.22 release freeze, it just didn't get a chance to be submitted somehow. Given that the pain point in #48622 has been reported for over two years and the corresponding CL did finish the work on schedule, I'd like to propose a freeze exception of Go1.22 for this, cc @golang/release. Thanks!
TCP keep-alives are enabled by setting the
SO_KEEPALIVE
socket option. When keep-alives are enabled, the operating system will probe idle connections and close connections with an unresponsive peer.Linux, FreeBSD, and Windows permit configuring keep-alive probes with the
TCP_KEEPIDLE
,TCP_KEEPINTVL
, andTCP_KEEPCNT
socket options. (Darwin appears to callTCP_KEEPIDLE
TCP_KEEPALIVE
.) These options set the idle duration before the first probe is sent, the interval between probes, and the maximum number of probes to send before declaring a connection dead, respectively.The
net
package permits configuring TCP keepalive behavior with:Dialer.KeepAlive
ListenConfig.KeepAlive
TCPConn.SetKeepAlive
TCPConn.SetKeepAlivePeriod
Setting the keep-alive period sets both
TCP_KEEPIDLE
andTCP_KEEPINTVL
. There is no way in thenet
package to setTCP_KEEPCNT
or otherwise configure how long it takes to declare a connection dead. There is no way to set different values for the idle period and the probe interval.Since Go 1.12 (https://go.dev/cl/107196), the
net
package has setTCP_KEEPIDLE
andTCP_KEEPINTVL
on all new sockets to 15 seconds by default.This 15 second value is not appropriate for all uses. For example, #48622 complains that the default keep-alive settings cause excessive CPU consumption on mobile devices due to frequent probes. Reducing the time until the first keep-alive probe is sent with
Dialer.KeepAlive
also reduces the time between probes; setting the period to five minutes will cause the kernel to take ~50 minutes to detect a dead connection.I propose that we change the
net
package functions which set the keep-alive period to setTCP_KEEPIDLE
, but notTCP_KEEPINTVL
. It is surprising and not useful for a "keep-alive period" of ten minutes to translate into a connection being declared dead only after several hours of unresponsiveness.I propose that we also add a finer-grained API for configuring keep-alive behavior. I haven't exhaustively surveyed the operating systems we support, but Linux, Windows, Darwin, and at least one BSD all appear to support the same three settings. It's possible to set these via
golang.org/x/sys
, but providing a common way to configure them in thenet
package seems reasonable.The following API also permits for enabling keep-alives on a connection while using the operating system default keep-alive configuration (set
KeepAliveConfig.Enable
, set the other fields to -1).