golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.24k stars 17.7k forks source link

net: crypto/tls.(*Conn).SetReadDeadline consumes CPU #9071

Closed gopherbot closed 9 years ago

gopherbot commented 10 years ago

by david.birdsong:

What does 'go version' print?
go version go1.3.3 linux/amd64

What steps reproduce the problem?
build listener, no deps:
https://github.com/davidbirdsong/go-tls-woes/blob/master/listener.go

1. connect 8 clients to the tls port on port 5114
2. connect 1 client on the raw port on 5566
3. send basic tcp handshakes to the tls port every 3 seconds ( a tcp ping from a load
balancer)
4. send ~5Mb/s of traffic to either port for 5-40 mins

What happened?
after random intervals after start, user cpu consumes or comes close to consuming cores
for GOMAXPROCS. 
call graph shows 
tls.(*Conn).SetReadDeadline executing suspiciously during a majority of the samples
(svg) http://f.cl.ly/items/24352O0E1j3c3U3Z1G1F/tls_cpu_burn.svg

it appears that any goroutine executing tls.(*Conn).SetReadDeadline pegs a CPU core

What should have happened instead?
not sure, but user cpu should not increase so much.

Please provide any additional information below.
this code sample is ripped out of a larger server. in the larger server (heka),
throughput for the server is severely hampered by the increase in user cpu. In this
contrived example, only user CPU increases, but throughput is maintained.

This code snippet will reproduce this condition at random times. We have a TCP proxy
listening for syslog data from our vendor that connects to the TLS port. We also have
unencrypted data flowing over the raw TCP port. The trigger is unknow--sometimes it
takes 5 mins for CPU to jump and throughput to drop, other times ~30-40 minutes.

I can try to get tcpdumps if that's needed.
ianlancetaylor commented 10 years ago

Comment 1:

Labels changed: added repo-main, release-go1.5.

bradfitz commented 9 years ago

Your svg shows handleConnection calling SetReadDeadline. And indeed, it calls it in a loop:

func handleConnection(conn net.Conn, stopChan chan bool, wg *sync.WaitGroup) {
    defer func() {
        conn.Close()
        wg.Done()

    }()
    var (
        e      error
        buffer []byte
    )
    buffer = make([]byte, 16384)

    running := true
    for running {
        e = conn.SetReadDeadline(time.Now().Add(5 * time.Second))
        select {
        case _, running = <-stopChan:
        default:
            _, e = conn.Read(buffer)
            if e != nil {
                if neterr, ok := e.(net.Error); ok && neterr.Timeout() {
                    continue
                } else {
                    running = false
                }
            }
        }
    }

}

If one of them ever times out, neterr.Timeout() will be true and you'll loop forever.

If you can still reproduce with that fixed and Go tip or Go 1.4.2, let us know.