golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.08k stars 17.68k forks source link

net: Dial is slow on localhost #23366

Closed levrik closed 1 year ago

levrik commented 6 years ago

See https://github.com/golang/go/issues/23366#issuecomment-374397983 for more details!

What version of Go are you using (go version)?

go version go1.9.2 windows/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

set GOARCH=amd64 set GOBIN= set GOEXE=.exe set GOHOSTARCH=amd64 set GOHOSTOS=windows set GOOS=windows set GORACE= set GOROOT=C:\Go set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64 set GCCGO=gccgo set CC=gcc set CXX=g++ set CGO_ENABLED=1 set CGO_CFLAGS=-g -O2 set CGO_CPPFLAGS= set CGO_CXXFLAGS=-g -O2 set CGO_FFLAGS=-g -O2 set CGO_LDFLAGS=-g -O2 set PKG_CONFIG=pkg-config

What did you do?

now := time.Now()
net.Dial("tcp", "localhost:3000")
fmt.Println(time.Since(now))

This code outputs around 1s. When replacing localhost with 127.0.0.1 it takes around 2ms, sometimes also <1ms. google.de:80 is around 60ms.

Is this expected to take so much longer with localhost? Took me several hours to find out that this was the root cause of multiple issues in my application.

davecheney commented 6 years ago

How fast is it if you dial 127.0.0.1:3000 ?

On 8 Jan 2018, at 07:05, Levin Rickert notifications@github.com wrote:

What version of Go are you using (go version)?

go version go1.9.2 windows/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

set GOARCH=amd64 set GOBIN= set GOEXE=.exe set GOHOSTARCH=amd64 set GOHOSTOS=windows set GOOS=windows set GORACE= set GOROOT=C:\Go set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64 set GCCGO=gccgo set CC=gcc set CXX=g++ set CGO_ENABLED=1 set CGO_CFLAGS=-g -O2 set CGO_CPPFLAGS= set CGO_CXXFLAGS=-g -O2 set CGO_FFLAGS=-g -O2 set CGO_LDFLAGS=-g -O2 set PKG_CONFIG=pkg-config

What did you do?

now := time.Now() net.Dial("tcp", "localhost:3000") fmt.Println(time.Since(now)) This code outputs around 1s. When replacing localhost with 127.0.0.1 it takes around 2ms. google.de:80 is around 60ms.

Is this expected to take so much longer with localhost? Took me several hours to find out that this was the root cause of multiple issues in my application.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

davecheney commented 6 years ago

Oh, sorry, you already answered this.

I don’t think there is anything that can be done, localhost is a dns name so it goes through the usual dns lookup channels.

Do you have access to a machine on a different network or with a different networking stack? Can you try to get some comparison results?

On 8 Jan 2018, at 07:05, Levin Rickert notifications@github.com wrote:

What version of Go are you using (go version)?

go version go1.9.2 windows/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

set GOARCH=amd64 set GOBIN= set GOEXE=.exe set GOHOSTARCH=amd64 set GOHOSTOS=windows set GOOS=windows set GORACE= set GOROOT=C:\Go set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64 set GCCGO=gccgo set CC=gcc set CXX=g++ set CGO_ENABLED=1 set CGO_CFLAGS=-g -O2 set CGO_CPPFLAGS= set CGO_CXXFLAGS=-g -O2 set CGO_FFLAGS=-g -O2 set CGO_LDFLAGS=-g -O2 set PKG_CONFIG=pkg-config

What did you do?

now := time.Now() net.Dial("tcp", "localhost:3000") fmt.Println(time.Since(now)) This code outputs around 1s. When replacing localhost with 127.0.0.1 it takes around 2ms. google.de:80 is around 60ms.

Is this expected to take so much longer with localhost? Took me several hours to find out that this was the root cause of multiple issues in my application.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

levrik commented 6 years ago

@davecheney As I said google.de:80 is working fast (60ms) so I don't think it's an DNS issue. I will continue to investigate.

levrik commented 6 years ago

@davecheney Just tried under macOS, same network. 2ms for localhost:3000 and 220µs for 127.0.0.1:3000. Maybe an Windows issue?

Edit: Tested in WSL. About 800µs for localhost:3000 and 200µs for 127.0.0.1:3000. Not sure if requests from WSL go through the Windows DNS stack.

bradfitz commented 6 years ago

Ephemeral port exhaustion maybe?

davecheney commented 6 years ago

My gut says a timeout due to a dns server refusing to looking "localhost" or balckholing the response.

On 8 January 2018 at 08:15, Brad Fitzpatrick notifications@github.com wrote:

Ephemeral port exhaustion maybe?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/23366#issuecomment-355853243, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAcA5XFp7xaoX6v9Plis6nLjxg5ibg_ks5tITPpgaJpZM4RVxBt .

agnivade commented 6 years ago

@levrik - Perhaps you can write a similar program in a different programming language and test it in the same box ? That will eliminate whether its an issue with your networking stack or with Go.

davecheney commented 6 years ago

That's a great idea. Another option, which might prove cheaper if it is conclusive is use ping or something and time (with a stopwatch, unless ping supports a one shot mode on windows) how long it takes for the first line to appear.

On 8 January 2018 at 15:21, Agniva De Sarker notifications@github.com wrote:

@levrik https://github.com/levrik - Perhaps you can write a similar program in a different programming language and test it in the same box ? That will eliminate whether its an issue with your networking stack or with Go.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/23366#issuecomment-355882122, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAcA5Zw5oFuvyzogm8ELGrvcSbtJ3y9ks5tIZfggaJpZM4RVxBt .

as commented 6 years ago

(with a stopwatch, unless ping supports a one shot mode on windows)

ping -n 1 localhost

levrik commented 6 years ago

@davecheney @agnivade Let me try later this evening.

levrik commented 6 years ago

Sorry for not responding. Just tried a ping to localhost. Took below 1ms.

kkinnunen commented 6 years ago

My connections were taking exactly 1s each with

Worked around with using dialing and listening on "tcp", 127.0.0.1:port

net.Listen network (tcp, tcp4) and address of ":port", "localhost:port", "127.0.0.1:port" seem affect the outcome (to my adhoc testing). For example:

go version go version go1.10 windows/amd64

ewxrjk commented 6 years ago

I encountered the same issue, and I've done some digging. The first clue was that if I tried to connect to a port that has no listener at all the delay was 2 seconds, not 1 second.

The delay isn't in the name lookup. But the results of the name lookup do matter: here, localhost results in two addresses, ::1 and 127.0.0.1. In this case net.Dial attempts to connect to ::1 first and 127.0.0.1 second.

The delay arises from failed connection attempts. Specifically, the call to fd.pfd.ConnectEx takes almost exactly a second when it fails (for any address, not just local ones). You can see this by modifying the implementation to print before/after timestamps.

In the localhost case, if you're listening on both IPv4 and IPv6, then it succeeds immediately. If you're listening on just IPv4 then you get failed connection attempted and a 1s delay followed by success. If you're listening on neither then you get two failed connection attempts and thus a 2s delay.

The same issue affects PuTTY. tcpdump reveals that what's going on is that Windows apparently does not believe the RST bits it gets back from the target of the connection (even if, apparently, it is itself the target) and keeps trying for up to a second.

I doubt that Go can do anything about this. For application developers the answer is to bind to both IPv4 and IPv6 addresses, so that it doesn't matter what connection order clients use.

A couple of references:

levrik commented 6 years ago

Ah, that's really interesting. The issue I have here is that I'm not in control of the target servers. I built a development proxy which also proxies websockets besides normal HTTP. If you use, for example, create-react-app which uses sockjs for the connection to its live reload server it will fall back first to eventsource and after that to polling. that is caused by some internal dynamic timeout that is calculated based on a normal HTTP call (which is fast). I think it is fine to just use 127.0.0.1 in that case, at least for now.

agnivade commented 6 years ago

@levrik - So, would it be fair to say that this issue can be closed ?

gopherbot commented 6 years ago

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

levrik commented 6 years ago

@agnivade Oh sorry. Completely forgot to anwser. Yes. Maybe it'd be an idea to put information about that into the Go docs?

agnivade commented 6 years ago

A PR will be much appreciated.

/cc @bradfitz

inliquid commented 5 years ago

Got same here with http package and some profiling tests, f.e. this code seemed broken and program just hanged until localhost was changed to 127.0.0.1

func main() {
    ctrl.Setup()

    go func() {
        if err := http.ListenAndServe("localhost:3000", &poms.GZipServer{}); err != nil {
            log.Fatalln("Error listening:", err)
        }
    }()

    f, err := os.Create("cpu.prof")
    if err != nil {
        log.Fatalln("Can't create profiler dump file:", err)
    }

    pprof.StartCPUProfile(f)

    for i := 0; i < 1000; i++ {
        if _, err := http.Get("http://localhost:3000/api/purchaseOrders/1"); err != nil {
            log.Println("Error sending request:", err)
        }
    }

    pprof.StopCPUProfile()
}

Note, that in this example both listener and client supposed to bind to same host/interface: localhost, and there is no way to get an idea of what is happening. No errors, no feedback on failed [::1] attempts, what so ever. After replacing localhost with 127.0.0.1 everything worked.

I expect to receive some errors back from Get when there are timed out or refused connect attempts on different interfaces, including attempts to connect on IPv6/IPv4.

as commented 5 years ago

@inliquid you have a transactional race in your program.

There is no guarantee that by the time your listener goroutine resolves the host part of the socket and starts the http.ListenAndServe logic, your client hasn't already tried 1000 times to connect and failed.

I wouldn't be surprised if changing it to the IPv4 address made this process quicker, allowing your server to listen before the clients started clienting.

Another question is how you determine success. Nothing happens if the client relieves no error.

I suggest creating your own listener and attaching it to the http server in two separate steps. This would allow you to implement your own net.Listener and net.Conn for the connections and add whatever debugging output you want.

inliquid commented 5 years ago

@as well, in fact I don't have race problem. I could have such problem as you suggested, but I don't. If there were any - I would see an error as a result. I did a lot of experiments before realized what is happening, including running server and client in 2 separate processes and emulating http server to see is http.Get even working. Problem is that http.Get returns nothing in this case.

ewxrjk commented 5 years ago

@inliquid there is not much Go can do about the connect issue. It is just reflecting the behavior of the underlying platform. The best strategies are either (1) bind to both loopback addresses, or (2) bind to just one of them but also only connect to that one.

In general binding to names rather than addresses is risky: if the name resolves to multiple addresses then it will bind to only one of those addresses. There's already a warning about this under net.Listen.

inliquid commented 5 years ago

I agree in general, however, I think it's clear that localhost in Windows is exceptional case and since it has nothing to do with DNS or hosts I think it must be treated in exceptional manner, explicitly, avoiding such situations were program behavior may change unexpectedly. Maybe just make "localhost" == "127.0.0.1", or even deny "localhost" and require explicit address, smth like

EllieLockhart commented 4 years ago

Is this still an issue of concern, and if so do we have potential courses of action? Is this a situation where at least a note in documentation should be made for the time being?

ewxrjk commented 4 years ago

@EllieLockhart the root cause is in Microsoft's TCP stack and only they can fix it.

There are some possible application/operator responses and they could be documented in https://golang.org/pkg/net/ (or elsewhere).

inliquid commented 4 years ago

@ewxrjk LOL. Also you can ask Microsoft to "fix" backslash to slash as the path separator.

There is nothing to "fix" from Microsoft part, its stack behaves as expected. The problem is Go standard library which denies to treat this behavior in a right way not forcing users to apply hacks and workarounds.

artemsnisarenko commented 2 years ago

I'll just leave this here https://github.com/openjdk/jdk/blob/be4f25b0c885a6405b484e4035b30005311a2802/src/java.base/windows/native/libnio/ch/Net.c#L240-L243

qmuntal commented 1 year ago

I'll just leave this here https://github.com/openjdk/jdk/blob/be4f25b0c885a6405b484e4035b30005311a2802/src/java.base/windows/native/libnio/ch/Net.c#L240-L243

Thanks for the pointer @artemsnisarenko. I've submitted CL 23366 based on that.

gopherbot commented 1 year ago

Change https://go.dev/cl/495875 mentions this issue: net: make Dial fail faster on Windows closed loopback devices