caddyserver / caddy

Fast and extensible multi-platform HTTP/1-2-3 web server with automatic HTTPS
https://caddyserver.com
Apache License 2.0
57.4k stars 4k forks source link

Memory leaks #6322

Open KGOH opened 4 months ago

KGOH commented 4 months ago

I have a minimal config, and I constantly observe leaking memory. I used version 2.5.0 and later upgrated to 2.7, but this didn't help

{$GRAFANA_DOMAIN} {
  reverse_proxy grafana:3000
}

{$COLLECTOR_DOMAIN} {
  reverse_proxy myapp:8080
}

image

Caddy is deployed via docker compose, here's a snippet:

  caddy:
    restart: unless-stopped
    image: caddy:2.7
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
      - ./ssl:/myapp/ssl
      - caddy_data:/data
    ports:
      - 80:80
      - 443:443
    networks:
      default:
        aliases:
          - ${GRAFANA_DOMAIN}
          - ${COLLECTOR_DOMAIN}
    environment:
      GRAFANA_DOMAIN:   ${GRAFANA_DOMAIN}
      COLLECTOR_DOMAIN: ${COLLECTOR_DOMAIN}
mholt commented 4 months ago

Is it possible that the backend isn't closing connections? What is myapp:8080 serving? How can we reproduce it?

If you could please collect a memory and goroutine profile that would help answer some questions: https://caddyserver.com/docs/profiling

Thank you

KGOH commented 4 months ago

@mholt myapp is a REST API server. Currently I'm not sure how exactly you can reproduce it without me sharing the whole app (which I can't do)

Here are memory and goroutine profiles goroutines.txt heap_debug=1.gz

here's an example of memory graph on the machine I got these profiles from image

mholt commented 4 months ago

Thanks -- interesting... what kind of traffic is your server getting?

The top heaps and goroutines are simply growing buffers to read from TLS connections, and serving HTTP/2, respectively.

Goroutines:

694 @ 0x43e44e 0x44e905 0xb234cf 0xb217cc 0xb20bbc 0x769b93 0x471901
#   0xb234ce    golang.org/x/net/http2.(*serverConn).serve+0x88e    golang.org/x/net@v0.17.0/http2/server.go:940
#   0xb217cb    golang.org/x/net/http2.(*Server).ServeConn+0xbcb    golang.org/x/net@v0.17.0/http2/server.go:531
#   0xb20bbb    golang.org/x/net/http2.ConfigureServer.func1+0xfb   golang.org/x/net@v0.17.0/http2/server.go:321
#   0x769b92    net/http.(*conn).serve+0x1212               net/http/server.go:1917

690 @ 0x43e44e 0x436c97 0x46bce5 0x4e11c7 0x4e24ba 0x4e24a8 0x545825 0x558ea5 0x69841b 0x50dc38 0x6985fe 0x695bd0 0x69be98 0x69bea1 0x4b8350 0xb16fe5 0xb16fb4 0xb17725 0xb22667 0x471901
#   0x46bce4    internal/poll.runtime_pollWait+0x84         runtime/netpoll.go:343
#   0x4e11c6    internal/poll.(*pollDesc).wait+0x26         internal/poll/fd_poll_runtime.go:84
#   0x4e24b9    internal/poll.(*pollDesc).waitRead+0x279        internal/poll/fd_poll_runtime.go:89
#   0x4e24a7    internal/poll.(*FD).Read+0x267              internal/poll/fd_unix.go:164
#   0x545824    net.(*netFD).Read+0x24                  net/fd_posix.go:55
#   0x558ea4    net.(*conn).Read+0x44                   net/net.go:179
#   0x69841a    crypto/tls.(*atLeastReader).Read+0x3a           crypto/tls/conn.go:805
#   0x50dc37    bytes.(*Buffer).ReadFrom+0x97               bytes/buffer.go:211
#   0x6985fd    crypto/tls.(*Conn).readFromUntil+0xdd           crypto/tls/conn.go:827
#   0x695bcf    crypto/tls.(*Conn).readRecordOrCCS+0x24f        crypto/tls/conn.go:625
#   0x69be97    crypto/tls.(*Conn).readRecord+0x157         crypto/tls/conn.go:587
#   0x69bea0    crypto/tls.(*Conn).Read+0x160               crypto/tls/conn.go:1369
#   0x4b834f    io.ReadAtLeast+0x8f                 io/io.go:335
#   0xb16fe4    io.ReadFull+0x64                    io/io.go:354
#   0xb16fb3    golang.org/x/net/http2.readFrameHeader+0x33     golang.org/x/net@v0.17.0/http2/frame.go:237
#   0xb17724    golang.org/x/net/http2.(*Framer).ReadFrame+0x84     golang.org/x/net@v0.17.0/http2/frame.go:498
#   0xb22666    golang.org/x/net/http2.(*serverConn).readFrames+0x86    golang.org/x/net@v0.17.0/http2/server.go:820

Heaps:

261: 10690560 [3380: 138444800] @ 0x50ddce 0x50d7fd 0x698594 0x698557 0x6964ff 0x69be98 0x69bea1 0x76322b 0x56d763 0x4b89a3 0x7773bb 0x777298 0x4b89a3 0x4b8806 0x55a386 0x55a343 0x5611ab 0x55fcf0 0x4b87a7 0x781d55 0x781d0f 0x56f74b 0x4b87a7 0x7759a8 0x77598f 0x775525 0x75ccad 0x7849ef 0x471901
#   0x50ddcd    bytes.growSlice+0x8d                bytes/buffer.go:249
#   0x50d7fc    bytes.(*Buffer).grow+0x13c          bytes/buffer.go:151
#   0x698593    bytes.(*Buffer).Grow+0x73           bytes/buffer.go:168
#   0x698556    crypto/tls.(*Conn).readFromUntil+0x36       crypto/tls/conn.go:826
#   0x6964fe    crypto/tls.(*Conn).readRecordOrCCS+0xb7e    crypto/tls/conn.go:676
#   0x69be97    crypto/tls.(*Conn).readRecord+0x157     crypto/tls/conn.go:587
#   0x69bea0    crypto/tls.(*Conn).Read+0x160           crypto/tls/conn.go:1369
#   0x76322a    net/http.(*connReader).Read+0x14a       net/http/server.go:791
#   0x56d762    bufio.(*Reader).Read+0xe2           bufio/bufio.go:230
#   0x4b89a2    io.(*LimitedReader).Read+0x42           io/io.go:480
#   0x7773ba    net/http.(*body).readLocked+0x3a        net/http/transfer.go:839
#   0x777297    net/http.(*body).Read+0x117         net/http/transfer.go:831
#   0x4b89a2    io.(*LimitedReader).Read+0x42           io/io.go:480
#   0x4b8805    io.copyBuffer+0x1a5             io/io.go:430
#   0x55a385    io.Copy+0x65                    io/io.go:389
#   0x55a342    net.genericReadFrom+0x22            net/net.go:671
#   0x5611aa    net.(*TCPConn).readFrom+0x6a            net/tcpsock_posix.go:54
#   0x55fcef    net.(*TCPConn).ReadFrom+0x2f            net/tcpsock.go:130
#   0x4b87a6    io.copyBuffer+0x146             io/io.go:416
#   0x781d54    io.Copy+0x54                    io/io.go:389
#   0x781d0e    net/http.persistConnWriter.ReadFrom+0xe     net/http/transport.go:1801
#   0x56f74a    bufio.(*Writer).ReadFrom+0x18a          bufio/bufio.go:797
#   0x4b87a6    io.copyBuffer+0x146             io/io.go:416
#   0x7759a7    io.Copy+0x47                    io/io.go:389
#   0x77598e    net/http.(*transferWriter).doBodyCopy+0x2e  net/http/transfer.go:412
#   0x775524    net/http.(*transferWriter).writeBody+0x3c4  net/http/transfer.go:370
#   0x75ccac    net/http.(*Request).write+0xbac         net/http/request.go:738
#   0x7849ee    net/http.(*persistConn).writeLoop+0x18e     net/http/transport.go:2424

14: 573440 [7879: 322723840] @ 0x50ddce 0x50d7fd 0x698594 0x698557 0x6964ff 0x69be98 0x69bea1 0x4b8350 0xb16fe5 0xb16fb4 0xb17725 0xb22667 0x471901
#   0x50ddcd    bytes.growSlice+0x8d                    bytes/buffer.go:249
#   0x50d7fc    bytes.(*Buffer).grow+0x13c              bytes/buffer.go:151
#   0x698593    bytes.(*Buffer).Grow+0x73               bytes/buffer.go:168
#   0x698556    crypto/tls.(*Conn).readFromUntil+0x36           crypto/tls/conn.go:826
#   0x6964fe    crypto/tls.(*Conn).readRecordOrCCS+0xb7e        crypto/tls/conn.go:676
#   0x69be97    crypto/tls.(*Conn).readRecord+0x157         crypto/tls/conn.go:587
#   0x69bea0    crypto/tls.(*Conn).Read+0x160               crypto/tls/conn.go:1369
#   0x4b834f    io.ReadAtLeast+0x8f                 io/io.go:335
#   0xb16fe4    io.ReadFull+0x64                    io/io.go:354
#   0xb16fb3    golang.org/x/net/http2.readFrameHeader+0x33     golang.org/x/net@v0.17.0/http2/frame.go:237
#   0xb17724    golang.org/x/net/http2.(*Framer).ReadFrame+0x84     golang.org/x/net@v0.17.0/http2/frame.go:498
#   0xb22666    golang.org/x/net/http2.(*serverConn).readFrames+0x86    golang.org/x/net@v0.17.0/http2/server.go:820
...

There's no memory leak here... just normal traffic. Not even a lot of traffic.

What does top or htop on the machine show?

I'd also be interested if you can reproduce this without the API backend.

KGOH commented 4 months ago

@mholt can this imply that my app doesn't close connections?

I'll get back to you with htop later

mholt commented 4 months ago

Quite possibly. It's a bit unfortunate that the graphs only show a single process. That's why I'm curious what the output of top is.

KGOH commented 4 months ago

@mholt, hello. The app is a java jetty server that has idle timeout of 200000ms (about 3.5 minutes) (doc)

Here's a graph for the java app, grafana and caddy processes for the last month. As you can see, there was a restart after a certain memory threshold. The java app doesn't seem to constantly grow in memory. The java app receives mobile traffic, that probably can drop due to unstable internet connection image

As I can see in the caddy docs, caddy's default request idle timeout is 5 minutes https://caddyserver.com/docs/caddyfile/options#timeouts This doesn't explain why we see the growing memory consumption

mholt commented 4 months ago

What are your clients / traffic patterns like?

Try setting some timeouts: https://caddyserver.com/docs/caddyfile/options#timeouts

The majority of the (few) goroutines that show up in the profiles are waiting for reads.

KGOH commented 3 months ago

Timeouts didn't help (the blue line that ends abruptly is caddy before I added timeouts, the red line is caddy after I restarted it with timeouts config added)

image

I added the following lines to Caddyfile

{
  servers {
    timeouts {
      read_body   5m
      read_header 5m
      write       5m
      idle        5m
    }
  }
}

I'm not sure how to describe traffic pattern. In short there are a lot of mobile connections. I guess they can drop off without properly closing connections. At the same time I observe a big number of such exceptions: ~15000 in 14 days: org.eclipse.jetty.io.EofException Early EOF

image
KGOH commented 3 months ago

I'm not sure whether this can be related. On my memory usage graphs you can also notice that java memory sometimes is growing somewhat similarly (not here though!)

Right now I'm also observing a memory leak within java

image

For the sake of context, here's another issue I created in the jetty server repo https://github.com/jetty/jetty.project/issues/11858

mholt commented 3 months ago

Interesting... one other thought I have is, what if you disable HTTP/3?

https://caddyserver.com/docs/caddyfile/options#protocols

hackerfactor commented 3 months ago

Has anyone been able to replicate this problem? Does it happen with 2.8.4?

zguig52 commented 2 months ago

Just had also a server memory swap detected due to Caddy after upgrading to 2.8.4. It seems it is still there (and on my side, this is only after upgrading from 2.7.6 to 2.8.4 that I observe it). Don't know if it can be also related, but I also use Caraza WAF module (v2.0.0-rc.3)

francislavoie commented 2 months ago

Don't know if it can be also related, but I also use Caraza WAF module (v2.0.0-rc.3)

Very much could be related. We can't really take builds with plugins as any kind of evidence because the problem could be in the plugin.

mholt commented 2 months ago

@zguig52 A profile would help if you could get one. https://caddyserver.com/docs/profiling

KGOH commented 1 month ago

Since my java app also has memory growing I replaced caddy with nginx to test whether the issue is in caddy or in java. I've been observing the memory consumption for a week now and I see that the java app memory consumption growth rate has decreased. Interesting

image image
mohammed90 commented 1 month ago

We have default keepalive config while nginx doesn't. Can you try setting the keepalive* config in reverse_proxy (docs) to something suitable for your app workload and monitor the memory consumption? You did mention that your upstream app doesn't close connections

joakime commented 1 month ago

Jetty follows the HTTP/1.0 (RFC1945) and HTTP/1.1 (RFC9112) specs.

If the client says HTTP/1.0 in its request header ..

If the client says HTTP/1.1 in its request header ...

If you want to see a more comprehensive set of tests related to the Connection header in Jetty (not limited to just close and keep-alive values), see the testcases.

https://github.com/jetty/jetty.project/blob/661546ecffbd2f2d8f1cf7b5ebebec8e4f06549e/jetty-core/jetty-server/src/test/java/org/eclipse/jetty/server/HttpConnectionTest.java#L1593-L1689

That shows the 4 piece of information Jetty uses to determine connection behavior. (the user-agent request HTTP spec version, the user-agent request Connection header value, the application specified response Connection header value, and the user-agent seen response Connection header value)