golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.88k stars 17.65k forks source link

proposal: net: populate source address on OpError when available #44127

Open andybons opened 3 years ago

andybons commented 3 years ago

What version of Go are you using (go version)?

$ go version
go version go1.16rc1 darwin/amd64

Does this issue reproduce with the latest release?

Yeppers

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/andybons/Library/Caches/go-build"
GOENV="/Users/andybons/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/andybons/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/andybons/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.16rc1"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/4w/rc3rq2s55w5fswy2n8kphy5r0000gn/T/go-build575792972=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

When using net.Dial or its variants and a connection fails, it can be extremely helpful to know the source IP/port for diagnostic purposes, but it isn’t populated within the OpError. The below code shows the behavior in action:

// The Go Failing TCP Connection Challenge!
//
// Print the source IP address and port number of an outgoing TCP connection
// that times out and doesn't connect to the remote host.
//
// Rules:
//
// 1. You can't set the local address or port number.  The local address must
//    be 0.0.0.0 (INADDR_ANY), and the port number must be 0 (so the kernel
//    selects an ephemeral port).
//
// 2. You can't use the "unsafe" package.
//
// 3. Whatever you do needs to return a net.Conn.
package main

import (
    "fmt"
    "net"
    "time"
)

func main() {
    if conn, err := net.DialTimeout("tcp4", "google.com:7357", time.Second); err != nil {
        fmt.Printf("Connection timed out:\n%#v\n%q\n", err, err)
        // &net.OpError{Op:"dial", Net:"tcp4", Source:net.Addr(nil), Addr:(*net.TCPAddr)(0xc000112630), Err:(*net.timeoutError)(0x1209ce0)}
        // "dial tcp4 172.217.10.46:7357: i/o timeout" <-- shows destination but not source
    } else {
        fmt.Printf("Connection succeeded:\n%#v\n%v -> %v\n", conn, conn.LocalAddr(), conn.RemoteAddr())
    }
}

What did you expect to see?

It would be nice if the OpError populated the source IP/port when it could, or allowed for clients to get access to it (say, if there were security concerns with exposing the source IP/port in the OpError by default if it may be blindly bubbled up to clients).

For reference, this is achieved through the C program below.

#include <arpa/inet.h>
#include <errno.h>
#include <fcntl.h>
#include <netdb.h>
#include <netinet/in.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <unistd.h>

int main() {
    struct addrinfo hints;
    memset(&hints, 0, sizeof hints);
    hints.ai_family = AF_INET;
    hints.ai_socktype = SOCK_STREAM;

    struct addrinfo *results;
    getaddrinfo("google.com", "7357", &hints, &results);

    int fd;
    struct addrinfo *r = results;

    fd = socket(r->ai_family, r->ai_socktype, r->ai_protocol);
    fcntl(fd, F_SETFL, fcntl(fd, F_GETFL, NULL) | O_NONBLOCK);
    connect(fd, r->ai_addr, r->ai_addrlen);

    struct sockaddr_in sal;
    socklen_t sal_len = sizeof sal;
    getsockname(fd, (struct sockaddr *)&sal, &sal_len);

    char la[256], ra[256];
    strlcpy(la, inet_ntoa(sal.sin_addr), 256);
    strlcpy(ra, inet_ntoa(((struct sockaddr_in *)(r->ai_addr))->sin_addr), 256);

    printf(
      "%s:%u -> %s:%u\n",
      la, ntohs(sal.sin_port),
      ra, ntohs(((struct sockaddr_in *)(r->ai_addr))->sin_port)
    );
    return 0;
}

What did you see instead?

Only a destination IP/port in the error.

/cc @rwg-stripe @bobby-stripe @bradfitz @rsc

gopherbot commented 3 years ago

Change https://golang.org/cl/290149 mentions this issue: net: report local address on unsuccessful TCP dial attempts

rsc commented 3 years ago

If we set Source, then it will be printed in the result from Error. And most of the time the Source on Dial is meaningless - it's the local computer and a random port address. I can see wanting it for debugging against tcpdump traces or the like. But we probably don't want to print it always - it's mostly noise, and these errors do get printed a lot, like every time a connection fails. And so if we add it to Source we'd have to add some extra bool to say "don't print it in the Error result".

rsc commented 3 years ago

Also, for the Dial call in question, we don't pick nor are we aware of the source address and port used by the kernel. We call getsockname after connect, but I'm not convinced getsockname is defined to work after connect fails (nor before it executes). (The man page agrees that a failed connect should not assume anything about the socket, presumably including not calling getsockname. It may or may not return something useful at all.)

So we may not be able to provide this information anyway. Note that the C code above does call getsockname without checking that connect has succeeded.

rsc commented 3 years ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

rsc commented 3 years ago

@andybons, can you say more about when this arises?

andybons commented 3 years ago

Placing on hold while I formulate a response :)