alexcrichton / curl-rust

Rust bindings to libcurl
MIT License
1.01k stars 234 forks source link

TCP_NODELAY not being set on the socket #379

Closed tailrecur closed 3 years ago

tailrecur commented 3 years ago

Hello,

I'm building a testing-tool on top of isahc and couldn't figure out some odd behaviour around the TCP connection as detailed here and here.

I finally narrowed it to down to the TCP socket not having the option TCP_NODELAY set on it even though the Rust program explicity sets this. I used strace and knetstat to verify this. I'm able to reproduce the issue locally with the following simple example:

Setup: OS: Debian 4.19.171-2 OpenSSL 1.1.1d

Cargo.toml:

[dependencies]
curl = "0.4.34"

Rust program:

use curl::easy::Easy;

fn main() {
    let mut easy = Easy::new();
    easy.verbose(true);
    easy.tcp_nodelay(true).unwrap();
    easy.url("https://www.example.org/").unwrap();
    easy.perform().unwrap();

    println!("{}", easy.response_code().unwrap());
}

Curl command(for comparison):

# curl --version
curl 7.64.0 (x86_64-pc-linux-gnu) libcurl/7.64.0 OpenSSL/1.1.1d zlib/1.2.11 libidn2/2.0.5 libpsl/0.20.2 (+libidn2/2.0.5) libssh2/1.8.0 nghttp2/1.36.0 librtmp/2.3
Release-Date: 2019-02-06
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtmp rtsp scp sftp smb smbs smtp smtps telnet tftp
Features: AsynchDNS IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy PSL

#curl -v https://www.example.org/

Note: curl has TCP_NODELAY on by default now

Differences between cargo run and curl command

stdout

cargo run

*   Trying 93.184.216.34:443...
* Connected to www.example.org (93.184.216.34) port 443 (#0)

curl

*   Trying 93.184.216.34...
* TCP_NODELAY set
* Connected to www.example.org (93.184.216.34) port 443 (#0)

knetstat

cargo run

0    523 10.140.0.2:48106        93.184.216.34:443       ESTB >#   SO_REUSEADDR=0,SO_REUSEPORT=0,SO_KEEPALIVE=0,TCP_NODELAY=0,TCP_DEFER_ACCEPT=0

curl

0    523 10.140.0.2:48190        93.184.216.34:443       ESTB >#   SO_REUSEADDR=0,SO_REUSEPORT=0,SO_KEEPALIVE=1,TCP_KEEPIDLE=60,TCP_KEEPINTVL=60,TCP_NODELAY=1,TCP_DEFER_ACCEPT=0

strace

cargo run

# strace -f -tt cargo run 2>&1 | grep sock
11:09:49.017727 socket(AF_INET, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 5
11:09:49.144278 getsockopt(5, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
11:09:49.145171 getsockname(5, {sa_family=AF_INET, sin_port=htons(48204), sin_addr=inet_addr("10.140.0.2")}, [256->16]) = 0

curl

# strace -f -tt curl -v https://www.example.org/ 2>&1 | grep sock
11:07:42.346046 socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
11:07:42.346623 setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
11:07:42.347039 setsockopt(3, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
11:07:42.347091 setsockopt(3, SOL_TCP, TCP_KEEPIDLE, [60], 4) = 0
11:07:42.347301 setsockopt(3, SOL_TCP, TCP_KEEPINTVL, [60], 4) = 0
11:07:42.473227 getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
11:07:42.473664 getsockname(3, {sa_family=AF_INET, sin_port=htons(48198), sin_addr=inet_addr("10.140.0.2")}, [128->16]) = 0

The curl-rust implementation here and here look the same as every other option and I've double-checked that the code 121 is correct. I'm not sure why the options are not getting set on the socket though.

Is my usage of easy.tcp_nodelay(true) incorrect in the sample program? Can you please try to reproduce this issue too?

/cc @sagebind (sort of related to https://github.com/sagebind/isahc/issues/303)

sagebind commented 3 years ago

I am not able to naively reproduce using your exact sample code on my current machine:

*   Trying 2606:2800:220:1:248:1893:25c8:1946...
* TCP_NODELAY set
* Connected to www.example.org (2606:2800:220:1:248:1893:25c8:1946) port 443 (#0)

I stepped through the code briefly and verified that libcurl was setting tcp_nodelay internally in response to the Easy::tcp_nodelay call. I can look into this deeper tonight.

tailrecur commented 3 years ago

@sagebind My gut feel is that I'm doing something wrong locally since this would be a bigger bug otherwise that would have been noticed earlier.

The only difference I can see in your example is that you are using IPv6. Would it be possible to force Ipv4 usage? I'll try to repeat this on a fresh server too.

tailrecur commented 3 years ago

I tried this on a fresh VM instance with the following spec:

root@latency-test:~/nodelay# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
root@latency-test:~/nodelay# cat /etc/debian_version
10.8
root@latency-test:~/nodelay# uname -a
Linux latency-test 4.19.0-14-cloud-amd64 #1 SMP Debian 4.19.171-2 (2021-01-30) x86_64 GNU/Linux

I still get the behaviour described above. I'm trying to figure out how to force Debian to use IPv6 resolution.

@sagebind which OS flavour are you on?

sagebind commented 3 years ago

My test was run on macOS 10.15.7. With libcurl 7.64.1.

tailrecur commented 3 years ago

@sagebind Thanks for the details!

I see the following behaviour on macOS 10.15.7 and libcurl 7.64.1.

With system curl

Cargo.toml:

curl = "0.4.34"

TCP_DELAY is set correctly

with static-curl

Cargo.toml:

curl = { version = "0.4.34", features = ["static-curl"]}

TCP_DELAY is not set.

I see no difference in output from cargo tree for both configurations. otool confirms that one uses system libcurl while the other does not.

Can you also try this with the static-curl feature turned on please?

sagebind commented 3 years ago

@tailrecur Great discovery! I ran your example with static-curl enabled and was able to reproduce. I stepped through the program with a debugger and confirmed that CURLOPT_TCP_NODELAY is still getting set, but curl was ignoring it. The reason why is revealed when you get to the definition of the internal tcpnodelay function:

https://github.com/curl/curl/blob/2f33be817cbce6ad7a36f27dd7ada9219f13584c/lib/connect.c#L1055-L1074

static void tcpnodelay(struct Curl_easy *data, curl_socket_t sockfd)
{
#if defined(TCP_NODELAY)
  curl_socklen_t onoff = (curl_socklen_t) 1;
  int level = IPPROTO_TCP;
#if !defined(CURL_DISABLE_VERBOSE_STRINGS)
  char buffer[STRERROR_LEN];
#else
  (void) data;
#endif

  if(setsockopt(sockfd, level, TCP_NODELAY, (void *)&onoff,
                sizeof(onoff)) < 0)
    infof(data, "Could not set TCP_NODELAY: %s\n",
          Curl_strerror(SOCKERRNO, buffer, sizeof(buffer)));
#else
  (void)data;
  (void)sockfd;
#endif
}

However, the TCP_NODELAY C macro is not defined for some reason in our current build process for static-curl, likely some missing headers, so libcurl just silently does nothing. I will look into this.

sagebind commented 3 years ago

It is also worth noting that looking for the TCP_NODELAY set debug line is not sufficient to know whether TCP_NODELAY is working or not because it was removed in curl 7.69.0:

https://github.com/curl/curl/commit/1b6cfb9d24cdcb822996bc23e2eb5a4cea73ec4e

tailrecur commented 3 years ago

@sagebind Thanks for the confirmation. Nice to know I wasn't dreaming this 😄

It is also worth noting that looking for the TCP_NODELAY set debug line is not sufficient to know whether TCP_NODELAY is working or not because it was removed in curl 7.69.0:

Nice to know. I was not too confident about relying on the debug logs in the original bug report and so wanted to independently confirm it with strace and knetstat just in case.

Thanks for the links to the curl code. I'll need to brush up on my C first to be able to tackle that.