cloudflare / cloudflared

Cloudflare Tunnel client (formerly Argo Tunnel)
https://developers.cloudflare.com/cloudflare-one/connections/connect-apps/install-and-setup/tunnel-guide
Apache License 2.0
9.3k stars 827 forks source link

http2 response writer panic & short write errors #450

Open gvccvwangmingn2 opened 3 years ago

gvccvwangmingn2 commented 3 years ago

Experiencing errors in running latest cloudflared tunnel (server-side, proxying web server tcp, used for long-polling&long-download), but no errors are thrown on cloudflared client side.

Web server is running pure http/1.1, no http/2 support. Web server is encapsulated with tcp (tcp://127.0.0.1:8081), not directly http (cannot be access from browser, must use cloudflared client)

2021-08-30T18:00:23Z INF Connection b283f476-5c89-4578-992c-cc10f12dcecd registered connIndex=0 location=DUS
2021-08-30T18:00:24Z INF Connection c6ea521a-59b5-420b-b49f-1596d5b8a545 registered connIndex=1 location=DUS
2021-08-30T18:00:25Z INF Connection e04925fb-b9be-4c1d-8d58-b2d7713099e7 registered connIndex=2 location=AMS
2021-08-30T18:00:26Z INF Connection 2d6693ed-ccce-4752-84b5-e414af665fc8 registered connIndex=3 location=AMS
Recover from http2 response writer panic, error (0xf29980,0x124f840)
2021-08-30T18:04:25Z ERR failed to write ping message error="short write"
2021-08-30T18:05:23Z ERR failed to write ping message error="short write"
2021-08-30T18:05:29Z ERR failed to write ping message error="short write"

Do you know why that happens? Does it have something to do with my connection? I saw other person having similar issue #411, but I'm not using docker/service, it's running on pure terminal, ubuntu 20.04.

From #411 it's says cloudflared -> tcp service issue, but looking at code, it seems to be connection between cloudflare edge and the cloudflared.

sudarshan-reddy commented 3 years ago

Can you give us some more information about your environment please?

  1. Do you have ingress rules configured? Can you show us?
  2. When does this error happen? As soon as you connect?
  3. What is the eyeball ? i.e. How are you accessing the origin service proxied by cloudflared?
gvccvwangmingn2 commented 3 years ago

Can you give us some more information about your environment please?

  1. Do you have ingress rules configured? Can you show us?
  2. When does this error happen? As soon as you connect?
  3. What is the eyeball ? i.e. How are you accessing the origin service proxied by cloudflared?
  1. Yes, ingress rules is configured. Just simple config:
tunnel: tunnel
ingress:
- hostname: web.xxxx.com
  service: tcp://localhost:8081
- service: http_status:404
  1. Error happens randomly, not as soon as connect. It is able to establish tcp connection, but when it have open/active connections, then error happens randomly

  2. Eyeball is through browser/nodejs, using cloudflared:

./cloudflared access tcp --hostname web.xxxx.com --url localhost:18081

I tried running using h2mux protocol (-p h2mux), it just throw errors constantly after ~1 minute of connecting to cloudflare edge.

2021-08-30T19:21:42Z INF Starting metrics server on 100.90.94.86:41239/metrics
2021-08-30T19:21:42Z INF Connection fe296086-48c0-419d-b768-25fed169d2e6 registered connIndex=0 location=LHR
2021-08-30T19:21:43Z INF Connection 2fc2b1de-379d-40c6-9519-de2ad145519a registered connIndex=1 location=LHR
2021-08-30T19:21:44Z INF Connection ccf2980f-23ec-45b9-a9d4-4915837fb262 registered connIndex=2 location=AMS
2021-08-30T19:21:45Z INF Connection c9699f24-e9c8-4da3-afe7-4caa3d62c43a registered connIndex=3 location=AMS
2021-08-30T19:22:47Z ERR failed to write ping message error=EOF
2021-08-30T19:22:47Z ERR failed to write ping message error=EOF
2021-08-30T19:22:47Z ERR failed to write ping message error=EOF
2021-08-30T19:22:47Z ERR failed to write ping message error=EOF
2021-08-30T19:22:47Z ERR failed to write ping message error=EOF
2021-08-30T19:22:47Z ERR failed to write ping message error=EOF
2021-08-30T19:22:47Z ERR failed to write ping message error=EOF
2021-08-30T19:22:47Z ERR failed to write ping message error=EOF
2021-08-30T19:22:47Z ERR failed to write ping message error=EOF
2021-08-30T19:22:47Z ERR failed to write ping message error=EOF
sudarshan-reddy commented 3 years ago

Just to clarify, what was the protocol you ran this with? h2mux or http2?

sudarshan-reddy commented 3 years ago

short writes can happen if your connecting application somehow does not read all of the data being sent. This results in an incomplete write but no errors. Are you sure your nodejs application is not throwing any errors? Or logging network copies?

gvccvwangmingn2 commented 3 years ago

Just to clarify, what was the protocol you ran this with? h2mux or http2?

running with http/2

gvccvwangmingn2 commented 3 years ago

short writes can happen if your connecting application somehow does not read all of the data being sent. This results in an incomplete write but no errors. Are you sure your nodejs application is not throwing any errors? Or logging network copies?

hm, I'll try to check on application side, but no errors thrown from cloudflared client side. from nodejs side, it just closes the connection.

gvccvwangmingn2 commented 3 years ago

@sudarshan-reddy, did some more debugging on client side, not seeing any errors/network indication.

I also did try changes in server side, by using reverse proxies such as nginx & caddy instead directly to my server, but still seeing those errors (http2 response writer panic & short write).

I think I have POC you can try, but sorry for my bad english. server & client running ubuntu 20.04, v2021.8.7-6

  1. Create static web server, have 1gb test file on it
  2. Create & run Argo tunnel, BUT don't expose http directly, use tcp service instead (tcp://127.0.0.1:8080)
  3. Run cloudflared in client (similar to my setup above)
  4. Download axel (wget alternative, apt install axel -y)
  5. Execute axel -n 100 http://localhost:18081/1gb.test, running 100 simultaneous connections (or maybe more)
  6. http2 response panic appears in cloudflared server side

*if only 1-5 concurrent downloads, no errors shown. it looks like this error is thrown when there is high number of connections

gvccvwangmingn2 commented 3 years ago

Another update, recompiled cloudflared (latest master) with go's race & debug flags, this is what I got (not a Go guy, not sure if related).

When terminating the process, there's a log saying Found 10 data race(s), but I only uploaded some of it below.

==================
WARNING: DATA RACE
Read at 0x00c0006d3c28 by goroutine 1199:
  bufio.(*Writer).Buffered()
      /usr/local/go/src/bufio/bufio.go:627 +0x84
  golang.org/x/net/http2.(*responseWriter).Flush()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:2580 +0x55
  golang.org/x/net/http2.(*responseWriter).handlerDone()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:2717 +0x8d
  golang.org/x/net/http2.(*serverConn).runHandler.func1()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:2166 +0x4b5
  golang.org/x/net/http2.(*serverConn).runHandler()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:2170 +0xcb

Previous write at 0x00c0006d3c28 by goroutine 1433:
  bufio.(*Writer).Flush()
      /usr/local/go/src/bufio/bufio.go:615 +0x1fb
  golang.org/x/net/http2.(*responseWriter).Flush()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:2581 +0xb0
  github.com/cloudflare/cloudflared/connection.(*http2RespWriter).Write()
      /root/cloudflared/connection/http2.go:239 +0x17e
  github.com/cloudflare/cloudflared/connection.(*HTTPResponseReadWriteAcker).Write()
      /root/cloudflared/connection/connection.go:135 +0x79
  github.com/gobwas/ws.WriteFrame()
      /root/cloudflared/vendor/github.com/gobwas/ws/write.go:95 +0xdd
  github.com/gobwas/ws/wsutil.writeFrame()
      /root/cloudflared/vendor/github.com/gobwas/ws/wsutil/writer.go:449 +0x193
  github.com/gobwas/ws/wsutil.WriteMessage()
      /root/cloudflared/vendor/github.com/gobwas/ws/wsutil/helper.go:161 +0xa8
  github.com/gobwas/ws/wsutil.WriteServerMessage()
      /root/cloudflared/vendor/github.com/gobwas/ws/wsutil/helper.go:167 +0x66
  github.com/gobwas/ws/wsutil.WriteServerBinary()
      /root/cloudflared/vendor/github.com/gobwas/ws/wsutil/helper.go:179 +0x65
  github.com/cloudflare/cloudflared/websocket.(*Conn).Write()
      /root/cloudflared/websocket/connection.go:119 +0x2f
  io.copyBuffer()
      /usr/local/go/src/io/io.go:425 +0x2be
  io.Copy()
      /usr/local/go/src/io/io.go:382 +0x74
  github.com/cloudflare/cloudflared/websocket.copyData()
      /root/cloudflared/websocket/websocket.go:98 +0x75
  github.com/cloudflare/cloudflared/websocket.Stream.func1()
      /root/cloudflared/websocket/websocket.go:36 +0xed

Goroutine 1199 (running) created at:
  golang.org/x/net/http2.(*serverConn).processHeaders()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:1898 +0x944
  golang.org/x/net/http2.(*serverConn).processFrame()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:1412 +0x45e
  golang.org/x/net/http2.(*serverConn).processFrameFromReader()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:1370 +0x7d4
  golang.org/x/net/http2.(*serverConn).serve()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:869 +0x154a
  golang.org/x/net/http2.(*Server).ServeConn()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:472 +0xdc4
  github.com/cloudflare/cloudflared/connection.(*HTTP2Connection).Serve()
      /root/cloudflared/connection/http2.go:80 +0x127
  github.com/cloudflare/cloudflared/origin.ServeHTTP2.func1()
      /root/cloudflared/origin/tunnel.go:448 +0x53
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /root/cloudflared/vendor/golang.org/x/sync/errgroup/errgroup.go:57 +0x94

Goroutine 1433 (finished) created at:
  github.com/cloudflare/cloudflared/websocket.Stream()
      /root/cloudflared/websocket/websocket.go:35 +0xa4
  github.com/cloudflare/cloudflared/ingress.DefaultStreamHandler()
      /root/cloudflared/ingress/origin_connection.go:28 +0x88
  github.com/cloudflare/cloudflared/ingress.(*tcpOverWSConnection).Stream()
      /root/cloudflared/ingress/origin_connection.go:51 +0xda
  github.com/cloudflare/cloudflared/origin.(*Proxy).proxyStream()
      /root/cloudflared/origin/proxy.go:249 +0x261
  github.com/cloudflare/cloudflared/origin.(*Proxy).ProxyHTTP()
      /root/cloudflared/origin/proxy.go:99 +0xaa4
  github.com/cloudflare/cloudflared/connection.(*HTTP2Connection).ServeHTTP()
      /root/cloudflared/connection/http2.go:119 +0x235
  net/http.Handler.ServeHTTP-fm()
      /usr/local/go/src/net/http/server.go:87 +0x6d
  golang.org/x/net/http2.(*serverConn).runHandler()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:2168 +0xc5
==================
==================
WARNING: DATA RACE
Read at 0x00c0001253db by goroutine 1199:
  golang.org/x/net/http2.(*responseWriter).handlerDone()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:2715 +0x5a
  golang.org/x/net/http2.(*serverConn).runHandler.func1()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:2166 +0x4b5
  golang.org/x/net/http2.(*serverConn).runHandler()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:2170 +0xcb

Previous write at 0x00c0001253db by goroutine 1433:
  golang.org/x/net/http2.(*responseWriterState).writeChunk()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:2503 +0xc77
  golang.org/x/net/http2.chunkWriter.Write()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:2382 +0x5a
  bufio.(*Writer).Flush()
      /usr/local/go/src/bufio/bufio.go:607 +0x141
  golang.org/x/net/http2.(*responseWriter).Flush()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:2581 +0xb0
  github.com/cloudflare/cloudflared/connection.(*http2RespWriter).Write()
      /root/cloudflared/connection/http2.go:239 +0x17e
  github.com/cloudflare/cloudflared/connection.(*HTTPResponseReadWriteAcker).Write()
      /root/cloudflared/connection/connection.go:135 +0x79
  github.com/gobwas/ws.WriteFrame()
      /root/cloudflared/vendor/github.com/gobwas/ws/write.go:95 +0xdd
  github.com/gobwas/ws/wsutil.writeFrame()
      /root/cloudflared/vendor/github.com/gobwas/ws/wsutil/writer.go:449 +0x193
  github.com/gobwas/ws/wsutil.WriteMessage()
      /root/cloudflared/vendor/github.com/gobwas/ws/wsutil/helper.go:161 +0xa8
  github.com/gobwas/ws/wsutil.WriteServerMessage()
      /root/cloudflared/vendor/github.com/gobwas/ws/wsutil/helper.go:167 +0x66
  github.com/gobwas/ws/wsutil.WriteServerBinary()
      /root/cloudflared/vendor/github.com/gobwas/ws/wsutil/helper.go:179 +0x65
  github.com/cloudflare/cloudflared/websocket.(*Conn).Write()
      /root/cloudflared/websocket/connection.go:119 +0x2f
  io.copyBuffer()
      /usr/local/go/src/io/io.go:425 +0x2be
  io.Copy()
      /usr/local/go/src/io/io.go:382 +0x74
  github.com/cloudflare/cloudflared/websocket.copyData()
      /root/cloudflared/websocket/websocket.go:98 +0x75
  github.com/cloudflare/cloudflared/websocket.Stream.func1()
      /root/cloudflared/websocket/websocket.go:36 +0xed

Goroutine 1199 (running) created at:
  golang.org/x/net/http2.(*serverConn).processHeaders()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:1898 +0x944
  golang.org/x/net/http2.(*serverConn).processFrame()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:1412 +0x45e
  golang.org/x/net/http2.(*serverConn).processFrameFromReader()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:1370 +0x7d4
  golang.org/x/net/http2.(*serverConn).serve()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:869 +0x154a
  golang.org/x/net/http2.(*Server).ServeConn()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:472 +0xdc4
  github.com/cloudflare/cloudflared/connection.(*HTTP2Connection).Serve()
      /root/cloudflared/connection/http2.go:80 +0x127
  github.com/cloudflare/cloudflared/origin.ServeHTTP2.func1()
      /root/cloudflared/origin/tunnel.go:448 +0x53
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /root/cloudflared/vendor/golang.org/x/sync/errgroup/errgroup.go:57 +0x94

Goroutine 1433 (finished) created at:
  github.com/cloudflare/cloudflared/websocket.Stream()
      /root/cloudflared/websocket/websocket.go:35 +0xa4
  github.com/cloudflare/cloudflared/ingress.DefaultStreamHandler()
      /root/cloudflared/ingress/origin_connection.go:28 +0x88
  github.com/cloudflare/cloudflared/ingress.(*tcpOverWSConnection).Stream()
      /root/cloudflared/ingress/origin_connection.go:51 +0xda
  github.com/cloudflare/cloudflared/origin.(*Proxy).proxyStream()
      /root/cloudflared/origin/proxy.go:249 +0x261
  github.com/cloudflare/cloudflared/origin.(*Proxy).ProxyHTTP()
      /root/cloudflared/origin/proxy.go:99 +0xaa4
  github.com/cloudflare/cloudflared/connection.(*HTTP2Connection).ServeHTTP()
      /root/cloudflared/connection/http2.go:119 +0x235
  net/http.Handler.ServeHTTP-fm()
      /usr/local/go/src/net/http/server.go:87 +0x6d
  golang.org/x/net/http2.(*serverConn).runHandler()
      /root/cloudflared/vendor/golang.org/x/net/http2/server.go:2168 +0xc5
==================
sudarshan-reddy commented 3 years ago

The Go data race is probably a false negative that sometimes happens when terminating io.copying streams. If you see the make file, all our tests are running with race.

sudarshan-reddy commented 3 years ago

Thank you for the POC and really appreciate you taking the time to give better directions. I will take a look and get back to you in a bit.

Also your English is great! I am able to completely understand your point! ❤️

gvccvwangmingn2 commented 3 years ago

This issue seems to be fixed by 6238fd902266e6cf90d3c5737411891668b4063d. Currently running it for 24h, no issues at all.

Thanks @chungthuang @sudarshan-reddy! Closing it for now.

chungthuang commented 3 years ago

@gvccvwangmingn2 thank you for reporting the issue!

gvccvwangmingn2 commented 3 years ago

@chungthuang, hm, it occured again this past couple of hours. thankfully there's trace now, here it is:

Recover from http2 response writer panic, error goroutine 118451921 [running]:
runtime/debug.Stack(0xc000a2e948, 0xf2aa60, 0x1250ca0)
        /usr/local/go/src/runtime/debug/stack.go:24 +0x9f
github.com/cloudflare/cloudflared/connection.(*http2RespWriter).Write.func1()
        /cfsetup_build/connection/http2.go:235 +0x46
panic(0xf2aa60, 0x1250ca0)
        /usr/local/go/src/runtime/panic.go:965 +0x1b9
golang.org/x/net/http2.(*responseWriter).write(0xc00000e250, 0x2, 0xc001e2d350, 0x2, 0xe, 0x0, 0x0, 0x0, 0x40db1b, 0xc0000b0120)
        /cfsetup_build/vendor/golang.org/x/net/http2/server.go:2692 +0x1f7
golang.org/x/net/http2.(*responseWriter).Write(0xc00000e250, 0xc001e2d350, 0x2, 0xe, 0x0, 0xe, 0x7fe40906e108)
        /cfsetup_build/vendor/golang.org/x/net/http2/server.go:2681 +0x56
github.com/cloudflare/cloudflared/connection.(*http2RespWriter).Write(0xc00061e000, 0xc001e2d350, 0x2, 0xe, 0x0, 0x0, 0x0)
        /cfsetup_build/connection/http2.go:238 +0x8c
github.com/cloudflare/cloudflared/connection.(*HTTPResponseReadWriteAcker).Write(0xc0028e2330, 0xc001e2d350, 0x2, 0xe, 0x2740, 0xc0000bde60, 0x18ebfa0)
        /cfsetup_build/connection/connection.go:136 +0x52
github.com/gobwas/ws.WriteHeader(0x7fe3e1fb7990, 0xc0028e2330, 0x20001, 0x78, 0x0, 0x0)
        /cfsetup_build/vendor/github.com/gobwas/ws/write.go:84 +0x111
github.com/gobwas/ws.WriteFrame(0x7fe3e1fb7990, 0xc0028e2330, 0x20001, 0x78, 0xc000a18000, 0x78, 0x8000, 0xc000a18000, 0x8000)
        /cfsetup_build/vendor/github.com/gobwas/ws/write.go:91 +0x45
github.com/gobwas/ws/wsutil.writeFrame(0x7fe3e1fb7990, 0xc0028e2330, 0xc002010201, 0xc000a18000, 0x78, 0x8000, 0x0, 0x0)
        /cfsetup_build/vendor/github.com/gobwas/ws/wsutil/writer.go:449 +0x15d
github.com/gobwas/ws/wsutil.WriteMessage(...)
        /cfsetup_build/vendor/github.com/gobwas/ws/wsutil/helper.go:161
github.com/gobwas/ws/wsutil.WriteServerMessage(...)
        /cfsetup_build/vendor/github.com/gobwas/ws/wsutil/helper.go:167
github.com/gobwas/ws/wsutil.WriteServerBinary(...)
        /cfsetup_build/vendor/github.com/gobwas/ws/wsutil/helper.go:179
github.com/cloudflare/cloudflared/websocket.(*Conn).Write(0xc0026b4700, 0xc000a18000, 0x78, 0x8000, 0x78, 0x0, 0x0)
        /cfsetup_build/websocket/connection.go:126 +0x8b
io.copyBuffer(0x7fe3e1e9f160, 0xc0026b4700, 0x1266a40, 0xc002388018, 0xc000a18000, 0x8000, 0x8000, 0x108c280, 0x4d7700, 0x1266a40)
        /usr/local/go/src/io/io.go:425 +0x21c
io.Copy(...)
        /usr/local/go/src/io/io.go:382
github.com/cloudflare/cloudflared/websocket.copyData(0x7fe3e1e9f160, 0xc0026b4700, 0x1266a40, 0xc002388018, 0x10e7460, 0xe, 0x127a550, 0xc000503380, 0xc002240790)
        /cfsetup_build/websocket/websocket.go:98 +0x5a
github.com/cloudflare/cloudflared/websocket.Stream.func1(0x126e1f0, 0xc0026b4700, 0x7fe3e1fb78e8, 0xc002388018, 0xc0000bc120, 0xc00190e0c0)
        /cfsetup_build/websocket/websocket.go:36 +0xca
created by github.com/cloudflare/cloudflared/websocket.Stream
        /cfsetup_build/websocket/websocket.go:35 +0x93

it worked flawlessly for 2 days straight, maybe some server-side change?

nmldiegues commented 3 years ago

Thanks for reporting it @gvccvwangmingn2 Again, this should not be impacting actual traffic, but we'll look into making sure it'll be fixed nonetheless.

chungthuang commented 3 years ago

@gvccvwangmingn2 release 2021.10.4 fixes the issue.

nmldiegues commented 3 years ago

Unfortunately we have detected that this fix caused a regression on long-lived websocket sessions (such as SSH or others proxied via cloudflared access tcp and family) so we will revert it and release cloudflared 2021.10.5

abelinkinbio commented 2 years ago

Thanks for raising this. The issue was resolved with the release of cloudflared version 2021.10.5.

gvccvwangmingn2 commented 2 years ago

@abelinkinbio isn't that build reverted from f8fbbcd806bb5ad7eee6971c4a38ee7eaedf5b67? haven't tested again though, might test next week

nmldiegues commented 2 years ago

Correct. This was reverted since the fix introduced a regression.

Given that this panic does not cause a real issue, we ended up not investing more time on it yet, but the issue should remain open.