envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.74k stars 4.75k forks source link

Crash on cluster initialization/healthcheck when ipv6 disabled #19992

Open smock514 opened 2 years ago

smock514 commented 2 years ago

Crash already cleared w/ envoy-security@, they OKed this being filed as an issue

Title: Crash on cluster initialization/healthcheck when ipv6 disabled

Description

[seanm514@286987d3 ~]$ /opt/vz/bin/envoy --version
/opt/vz/bin/envoy  version: ea23f47b27464794980c05ab290a3b73d801405e/1.20.1/Yahoo/RELEASE/BoringSSL

Repro steps

  1. Stand up RHEL instance (tested against 7.9.21 and 8.5.1)
  2. Install envoy (tested against 1.20.1 and reproduced with similar configs in 1.9.1)
  3. Create test envoy configuration expressing a LOGICAL_DNS cluster pointing to an IPv6-enabled endpoint (yahoo.com in this case, but any domain expressing an IPv6 address should work)
  4. Disable IPv6 via GRUB (along these lines)
  5. Reboot instance
  6. Reproduce crash by starting envoy with the test configuration

Detailed Steps

<!-- SSH to RHEL instance -->

[seanm514@286987d3 ~]$ cat /etc/redhat-release
Red Hat Enterprise Linux release 8.5 (Ootpa)

[seanm514@286987d3 ~]$ sudo yum install envoy-1.20.1
Last metadata expiration check: 0:00:06 ago on Wed 09 Feb 2022 06:05:39 PM UTC.
Dependencies resolved.
================================================================================
 Package             Arch   Version                    Repository          Size
================================================================================
Installing:
 envoy               x86_64 1.20.1-1.el8               media_edge-release  17 M
...
Installed:
  envoy-1.20.1-1.el8.x86_64
...

[seanm514@286987d3 ~]$ /opt/vz/bin/envoy --version
/opt/vz/bin/envoy  version: ea23f47b27464794980c05ab290a3b73d801405e/1.20.1/Yahoo/RELEASE/BoringSSL

[seanm514@286987d3 ~]$ cat /tmp/ipv6_test.yaml
static_resources:
  clusters:
  - name: service_test
    connect_timeout: 1s
    type: LOGICAL_DNS
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: service_test
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                protocol: TCP
                address: [yahoo.com](http://yahoo.com/)
                port_value: 443

[seanm514@286987d3 ~]$ ping -6 [yahoo.com](http://yahoo.com/)
PING [yahoo.com](http://yahoo.com/)([media-router-fp74.prod.media.vip.bf1.yahoo.com](http://media-router-fp74.prod.media.vip.bf1.yahoo.com/) (2001:4998:124:1507::f001)) 56 data bytes
64 bytes from [media-router-fp74.prod.media.vip.bf1.yahoo.com](http://media-router-fp74.prod.media.vip.bf1.yahoo.com/) (2001:4998:124:1507::f001): icmp_seq=1 ttl=44 time=90.3 ms

[seanm514@286987d3 ~]$ sudo vi /etc/default/grub

[seanm514@286987d3 ~]$ cat /etc/default/grub | grep GRUB_CMDLINE_LINUX
GRUB_CMDLINE_LINUX="crashkernel=auto console=tty0 console=ttyS0,115200n8 net.ifnames=0 ipv6.disable=1"

[seanm514@286987d3 ~]$ sudo /usr/sbin/grub2-mkconfig -o /boot/grub2/grub.cfg
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.18.0-348.12.2.el8_5.x86_64
Found initrd image: /boot/initramfs-4.18.0-348.12.2.el8_5.x86_64.img
done

<!-- REBOOT INSTANCE -->
<!-- SSH back to RHEL instance -->

[seanm514@286987d3 ~]$ ping -6 [yahoo.com](http://yahoo.com/)
ping: socket: Address family not supported by protocol

[seanm514@286987d3 ~]$ /opt/vz/bin/envoy -c /tmp/ipv6_test.yaml
[2022-02-09 18:14:34.326][1469][info][main] [source/server/server.cc:368] initializing epoch 0 (base id=0, hot restart version=11.104)
...
<see logs below>
Aborted (core dumped)

Config

static_resources:
  clusters:
  - name: service_test
    connect_timeout: 1s
    type: LOGICAL_DNS
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: service_test
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                protocol: TCP
                address: yahoo.com
                port_value: 443

Logs

[seanm514@286987d3 ~]$ /opt/vz/bin/envoy -c /tmp/ipv6_test.yaml
[2022-02-09 18:14:34.326][1469][info][main] [source/server/server.cc:368] initializing epoch 0 (base id=0, hot restart version=11.104)
...
[2022-02-09 18:14:34.332][1469][info][main] [source/server/server.cc:740] runtime: {}
[2022-02-09 18:14:34.332][1469][warning][main] [source/server/server.cc:585] No admin address given, so no admin HTTP server started.
[2022-02-09 18:14:34.332][1469][info][config] [source/server/configuration_impl.cc:127] loading tracing configuration
[2022-02-09 18:14:34.332][1469][info][config] [source/server/configuration_impl.cc:87] loading 0 static secret(s)
[2022-02-09 18:14:34.332][1469][info][config] [source/server/configuration_impl.cc:93] loading 1 cluster(s)
[2022-02-09 18:14:34.333][1469][info][config] [source/server/configuration_impl.cc:97] loading 0 listener(s)
[2022-02-09 18:14:34.333][1469][info][config] [source/server/configuration_impl.cc:109] loading stats configuration
[2022-02-09 18:14:34.333][1469][info][main] [source/server/server.cc:836] starting main dispatch loop
[2022-02-09 18:14:34.335][1469][critical][main] [source/exe/terminate_handler.cc:12] std::terminate called! (possible uncaught exception, see trace)
[2022-02-09 18:14:34.335][1469][critical][backtrace] [./source/server/backtrace.h:91] Backtrace (use tools/stack_decode.py to get line numbers):
[2022-02-09 18:14:34.335][1469][critical][backtrace] [./source/server/backtrace.h:92] Envoy version: ea23f47b27464794980c05ab290a3b73d801405e/1.20.1/Yahoo/RELEASE/BoringSSL
[2022-02-09 18:14:34.343][1469][critical][backtrace] [./source/server/backtrace.h:96] #0: Envoy::TerminateHandler::logOnTerminate()::$_0::operator()() [0x555fb19d491f]
[2022-02-09 18:14:34.350][1469][critical][backtrace] [./source/server/backtrace.h:98] #1: [0x555fb19d46a9]
[2022-02-09 18:14:34.356][1469][critical][backtrace] [./source/server/backtrace.h:96] #2: __cxxabiv1::__terminate() [0x555fb204ac7c]
[2022-02-09 18:14:34.356][1469][critical][backtrace] [./source/server/backtrace.h:104] Caught Aborted, suspect faulting address 0x5327000005bd
[2022-02-09 18:14:34.356][1469][critical][backtrace] [./source/server/backtrace.h:91] Backtrace (use tools/stack_decode.py to get line numbers):
[2022-02-09 18:14:34.356][1469][critical][backtrace] [./source/server/backtrace.h:92] Envoy version: ea23f47b27464794980c05ab290a3b73d801405e/1.20.1/Yahoo/RELEASE/BoringSSL
[2022-02-09 18:14:34.357][1469][critical][backtrace] [./source/server/backtrace.h:96] #0: __restore_rt [0x7f7911650c20]
[2022-02-09 18:14:34.363][1469][critical][backtrace] [./source/server/backtrace.h:98] #1: [0x555fb19d46a9]
[2022-02-09 18:14:34.363][1469][critical][backtrace] [./source/server/backtrace.h:96] #2: __cxxabiv1::__terminate() [0x555fb204ac7c]
Aborted (core dumped)

Call Stack

See Logs above

Mitigations

The crash can be avoided by:

However, it is somewhat brittle to rely on manually configuring V4_ONLY for each new cluster we add, so this issue seemed worth reporting.

KBaichoo commented 2 years ago

cc @alyssawilk @zuercher as cluster experts

alyssawilk commented 2 years ago

yeah, this was discussed on setec (thanks for responsible reporting!) and we agreed it should be fixed when someone has the cycles.