envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
25.07k stars 4.82k forks source link

Envoy with golang filter crashes randomly #37225

Open tahmoor opened 3 days ago

tahmoor commented 3 days ago

We implemented an envoy golang filter on envoy 1.32.1 and it crashes randomly. After further investigation we found that this crash happens during garbage collection. We also implemented a golang filter that does nothing and saw the same crash happen randomly. Also note that when we call runtime.GC() manually on each request, the rate of crash increases.

Here is the call stack of crash: [2024-11-15 12:11:10.256][32][critical][backtrace] [./source/server/backtrace.h:127] Caught Segmentation fault, suspect faulting address 0x34c19902aab0 [2024-11-15 12:11:10.256][32][critical][backtrace] [./source/server/backtrace.h:111] Backtrace (use tools/stack_decode.py to get line numbers): [2024-11-15 12:11:10.256][32][critical][backtrace] [./source/server/backtrace.h:112] Envoy version: e3b4a6e9570da15ac1caffdded17a8bebdc7dfc9/1.32.1/Clean/RELEASE/BoringSSL [2024-11-15 12:11:10.256][32][critical][backtrace] [./source/server/backtrace.h:114] Address mapping: 56501b83c000-56501f3b6000 /usr/local/bin/envoy [2024-11-15 12:11:10.257][32][critical][backtrace] [./source/server/backtrace.h:119] #0: runtime.sigfwd.abi0 [0x7fe229faa7e0] [2024-11-15 12:11:10.257][32][critical][backtrace] [./source/server/backtrace.h:119] #1: runtime.sigfwdgo [0x7fe229f833b1] [2024-11-15 12:11:10.257][32][critical][backtrace] [./source/server/backtrace.h:119] #2: runtime.sigtrampgo [0x7fe229f81d45] [2024-11-15 12:11:10.257][32][critical][backtrace] [./source/server/backtrace.h:119] #3: runtime.sigtramp.abi0 [0x7fe229faa849] [2024-11-15 12:11:10.257][32][critical][backtrace] [./source/server/backtrace.h:119] #4: runtime.sigfwd.abi0 [0x7fe1dfbe6920] [2024-11-15 12:11:10.257][32][critical][backtrace] [./source/server/backtrace.h:119] #5: runtime.sigfwdgo [0x7fe1dfbbf3b1] [2024-11-15 12:11:10.257][32][critical][backtrace] [./source/server/backtrace.h:119] #6: runtime.sigtrampgo [0x7fe1dfbbdd45] [2024-11-15 12:11:10.257][32][critical][backtrace] [./source/server/backtrace.h:119] #7: runtime.sigtramp.abi0 [0x7fe1dfbe6989] [2024-11-15 12:11:10.258][32][critical][backtrace] [./source/server/backtrace.h:119] #8: runtime.sigfwd.abi0 [0x7fe193e2e2e0] [2024-11-15 12:11:10.259][32][critical][backtrace] [./source/server/backtrace.h:119] #9: runtime.sigfwdgo [0x7fe193e063f1] [2024-11-15 12:11:10.259][32][critical][backtrace] [./source/server/backtrace.h:119] #10: runtime.sigtrampgo [0x7fe193e04d85] [2024-11-15 12:11:10.260][32][critical][backtrace] [./source/server/backtrace.h:119] #11: runtime.sigtramp.abi0 [0x7fe193e2e349] [2024-11-15 12:11:10.260][32][critical][backtrace] [./source/server/backtrace.h:121] #12: [0x7fe22d76c520] [2024-11-15 12:11:10.260][32][critical][backtrace] [./source/server/backtrace.h:119] #13: envoyGoFilterHttpFinalize [0x56501d838b75] [2024-11-15 12:11:10.260][32][critical][backtrace] [./source/server/backtrace.h:119] #14: runtime.asmcgocall.abi0 [0x7fe193e2c481]

The source code of our sample golang plugin is attached. gc.zip

soulxu commented 3 days ago

cc @doujiang24

doujiang24 commented 2 days ago

@tahmoor Thanks for your feedback. Seems weird to me, it's a simple case, please provide more clues:

  1. how did you build the envoy binary? or could you reproduce it by using the official docker image,envoyproxy/envoy:contrib-v1.32.1?
  2. which golang version are you using? and how did you build the golang so file? with which glibc version in your build machine?
tahmoor commented 2 days ago

@doujiang24 thanks for your response. We used official envoy image: docker.io/envoyproxy/envoy:contrib-v1.32.1 Our plugin source code is 1.22 compatible and for building the plugin we installed official go 1.23.1 from https://go.dev/dl/ on above envoy-contrib image. Also, we built golang plugins using golang:1.22.9-bullseye and golang:1.23.1-bullseye images and saw the same problem.

doujiang24 commented 2 days ago

@tahmoor I had a try with envoyproxy/envoy:contrib-v1.32.1 + golang:1.22.9-bullseye, but no able to reproduce it:

wrk -t 1 -c 100 -d 100 http://localhost:8089
Running 2m test @ http://localhost:8089
  1 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    11.82ms    5.94ms 141.23ms   66.99%
    Req/Sec     8.57k   302.55     9.30k    78.70%
  853045 requests in 1.67m, 124.64MB read
Requests/sec:   8529.42
Transfer/sec:      1.25MB

Here is the full runable demo: https://github.com/doujiang24/test-golang-segfault

Please provide more info from your side, feel free to create a PR to the test demo repo, so that I can reproduce it.