envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
25.11k stars 4.82k forks source link

Postgres proxy: Envoy Crash during pgbench test #36471

Open shiponcs opened 1 month ago

shiponcs commented 1 month ago

Title: Envoy Crash during pgbench test

Description: We have run pgbench to test postgres server proxied by Envoy. During the test Envoy crashed with the following log:

[2024-09-19 14:22:51.227][42][critical][backtrace] [./source/server/backtrace.h:127] Caught Segmentation fault, suspect faulting address 0x0
[2024-09-19 14:22:51.227][42][critical][backtrace] [./source/server/backtrace.h:111] Backtrace (use tools/stack_decode.py to get line numbers):
[2024-09-19 14:22:51.227][42][critical][backtrace] [./source/server/backtrace.h:112] Envoy version: ca7ce529646b1f7c8c760e0e249e285cee4549ed/1.30.2/Modified/RELEASE/BoringSSL
[2024-09-19 14:22:51.227][42][critical][backtrace] [./source/server/backtrace.h:114] Address mapping: 560479564000-56047cf63000 /usr/local/bin/envoy
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #0: [0x7fa621681520]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #1: [0x56047b34bfc4]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #2: [0x56047b34949a]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #3: [0x56047b348d88]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #4: [0x56047b345b6d]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #5: [0x56047b3304fa]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #6: [0x56047b32fd00]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #7: [0x56047b3443e7]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #8: [0x56047bb7aec5]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #9: [0x56047bb72663]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #10: [0x56047bb6f2f8]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #11: [0x56047bb4d2f1]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #12: [0x56047bb4e89d]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #13: [0x56047ca79480]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #14: [0x56047ca77dc1]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #15: [0x56047ad6332f]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #16: [0x56047cafac63]
[2024-09-19 14:22:51.228][42][critical][backtrace] [./source/server/backtrace.h:121] #17: [0x7fa6216d3ac3]
ConnectionImpl 0x26537f7fe640, connecting_: 0, bind_error_: 0, state(): Open, read_buffer_limit_: 32768
socket_: 
  ListenSocketImpl 0x26537f0aeab0, transport_protocol_: raw_buffer
  connection_info_provider_: 
    ConnectionInfoSetterImpl 0x26537f09e518, remote_address_: 100.126.0.213:54396, direct_remote_address_: 100.126.0.213:54396, local_address_: 100.126.5.153:50815, server_name_:

Testing scope: Envoy configuration: https://gist.github.com/shiponcs/600fff61cddba5911bc3b9e538a4fc8c pgbench command: pgbench -h localhost -p 50815 -U postgres -c 50 -j 1 -T 00 example

Relevant: We have tried to find the reason; by running git bisect we discovered that, this commit is causing the crash. If we get rid of the changes introduced in the commit, the issue gets resolved.

alyssawilk commented 1 month ago

Thanks for running this by Envoy security before (with approval) posting in the clear

@fabriziomello @cpakulski can you take a look? I don't think we should necessarily roll back an Envoy build config change based on a broken contrib extension

cpakulski commented 1 month ago

Thanks for reporting it. I will try to repro the crash to find the cause.

I don't think we should necessarily roll back an Envoy build config change based on a broken contrib extension.

I agree. I think that the build config probably just exposes the problem but is not the cause of the crash.

fabriziomello commented 1 month ago

@cpakulski Long time since I ran this regression tests https://github.com/fabriziomello/envoy-postgres-regression. It requires some reworking to work properly again and don't need to waste resources building Envoy.

One naive question, those -dev docker images are a kind of nightly-builds?

cpakulski commented 1 month ago

@fabriziomello I think that this is what you are looking for: https://hub.docker.com/layers/envoyproxy/envoy/contrib-debug-dev/images/sha256-770fe2414e700156673389677dedca96133fb7b69cb2d9a222253e15fd25b11f?context=explore

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

cpakulski commented 2 weeks ago

I am still planning to investigate the cause.

caoyukun0430 commented 1 week ago

We also reported similar segmentation fault issue recently, any update on the cause? Thanks a lot!

cpakulski commented 1 week ago

It is under investigation. I am building test bed.

caoyukun0430 commented 1 week ago

It is under investigation. I am building test bed.

Hi @cpakulski thank you very much for taking the effort to debug it! Have you managed to reproduce the issue? Please let us know if anything useful is needed, e.g. we can try to reproduce from our side and provide the proper backtrace and coredump if they can help to speed up the debugging.

cpakulski commented 1 week ago

@caoyukun0430 . Yes I just managed to repro it. The crash is related to external sql parsing library. If you do not care about parsing SQL statements add the following line into your config to mitigate :

      - name: envoy.filters.network.postgres_proxy
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.postgres_proxy.v3alpha.PostgresProxy
          stat_prefix: egress_pg
          enable_sql_parsing: false   <-- Add this line

If you do care about parsing SQL statements, you need to wait until I fix it in sqlparser library.

Parsing is required only if you want to create a metadata and later on do something based on that metadata, like feed it into RBAC. Otherwise you do not need it.