envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.98k stars 4.81k forks source link

Postgres proxy: Envoy Crash during in pgbench test #36471

Open shiponcs opened 1 month ago

shiponcs commented 1 month ago

Title: Envoy Crash during in pgbench test

Description: We have run pgbench to test postgres server proxied by Envoy. During the test Envoy crashed with the following log:

[2024-09-19 14:22:51.227][42][critical][backtrace] [./source/server/backtrace.h:127] Caught Segmentation fault, suspect faulting address 0x0
[2024-09-19 14:22:51.227][42][critical][backtrace] [./source/server/backtrace.h:111] Backtrace (use tools/stack_decode.py to get line numbers):
[2024-09-19 14:22:51.227][42][critical][backtrace] [./source/server/backtrace.h:112] Envoy version: ca7ce529646b1f7c8c760e0e249e285cee4549ed/1.30.2/Modified/RELEASE/BoringSSL
[2024-09-19 14:22:51.227][42][critical][backtrace] [./source/server/backtrace.h:114] Address mapping: 560479564000-56047cf63000 /usr/local/bin/envoy
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #0: [0x7fa621681520]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #1: [0x56047b34bfc4]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #2: [0x56047b34949a]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #3: [0x56047b348d88]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #4: [0x56047b345b6d]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #5: [0x56047b3304fa]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #6: [0x56047b32fd00]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #7: [0x56047b3443e7]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #8: [0x56047bb7aec5]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #9: [0x56047bb72663]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #10: [0x56047bb6f2f8]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #11: [0x56047bb4d2f1]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #12: [0x56047bb4e89d]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #13: [0x56047ca79480]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #14: [0x56047ca77dc1]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #15: [0x56047ad6332f]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #16: [0x56047cafac63]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #17: [0x7fa6216d3ac3]
ConnectionImpl 0x26537f7fe640, connecting_: 0, bind_error_: 0, state(): Open, read_buffer_limit_: 32768
socket_: 
  ListenSocketImpl 0x26537f0aeab0, transport_protocol_: raw_buffer
  connection_info_provider_: 
    ConnectionInfoSetterImpl 0x26537f09e518, remote_address_: 100.126.0.213:54396, direct_remote_address_: 100.126.0.213:54396, local_address_: 100.126.5.153:50815, server_name_:

Testing scope: Envoy configuration: https://gist.github.com/shiponcs/600fff61cddba5911bc3b9e538a4fc8c pgbench command: pgbench -h localhost -p 50815 -U postgres -c 50 -j 1 -T 00 example

Relevant: We have tried to find the reason; by running git bisect we discovered that, this commit is causing the crash. If we get rid of the changes introduced in the commit, the issue gets resolved.

alyssawilk commented 1 month ago

Thanks for running this by Envoy security before (with approval) posting in the clear

@fabriziomello @cpakulski can you take a look? I don't think we should necessarily roll back an Envoy build config change based on a broken contrib extension

cpakulski commented 1 month ago

Thanks for reporting it. I will try to repro the crash to find the cause.

I don't think we should necessarily roll back an Envoy build config change based on a broken contrib extension.

I agree. I think that the build config probably just exposes the problem but is not the cause of the crash.

fabriziomello commented 1 month ago

@cpakulski Long time since I ran this regression tests https://github.com/fabriziomello/envoy-postgres-regression. It requires some reworking to work properly again and don't need to waste resources building Envoy.

One naive question, those -dev docker images are a kind of nightly-builds?

cpakulski commented 1 month ago

@fabriziomello I think that this is what you are looking for: https://hub.docker.com/layers/envoyproxy/envoy/contrib-debug-dev/images/sha256-770fe2414e700156673389677dedca96133fb7b69cb2d9a222253e15fd25b11f?context=explore

github-actions[bot] commented 19 hours ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

cpakulski commented 19 hours ago

I am still planning to investigate the cause.