envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.75k stars 4.76k forks source link

Envoy NACKs on a valid IPv6 address #29902

Open suniltheta opened 11 months ago

suniltheta commented 11 months ago

Description:

We ran into this issue that seems definitely unusual. I have considered if this is caused due to underlying host not supporting IPv6 or working in AWS bridge mode but not able to reproducible locally.

A NACK thrown in Envoy version v1.27.

failureReason:Error adding/updating listener(s) egress: malformed IP address: 2600:f0f0:0:0:0:0:0:1

As per NACK Envoy couldn't validate that this is a proper IPv6 address which was provided in filter chain match like

         "filter_chain_match": {
          "prefix_ranges": [
           {
            "address_prefix": "127.255.0.1",
            "prefix_len": 32
           },
           {
            "address_prefix": "2600:f0f0:0:0:0:0:0:1",
            "prefix_len": 128
           }
          ],

I believe the IPv6 address validation is done in https://github.com/envoyproxy/envoy/blob/83e604abd8214f379617e6320d2255ea20ca0e1f/source/common/network/utility.cc#L117

Repro steps: Not able to reproduce as this was noted on customer's environment.

Config: Same as the one mentioned above.

{
     "name": "egress",
     "active_state": {
      "version_info": "1",
      "listener": {
       "@type": "type.googleapis.com/envoy.config.listener.v3.Listener",
       "name": "egress",
       "address": {
        "socket_address": {
         "address": "127.0.0.1",
         "port_value": 59632
        }
       },
       "filter_chains": [
        {
         "filter_chain_match": {
          "prefix_ranges": [
           {
            "address_prefix": "127.255.0.1",
            "prefix_len": 32
           },
           {
            "address_prefix": "2600:f0f0:0:0:0:0:0:1",
            "prefix_len": 128
           }
          ],
          "destination_port": 8156
         },
         "filters": [
         ....

Logs:

Logs on Control Plane

26 09 2023 20:20:50,731 [WARN] envoy.common.EnvoyNackHandler: Received NACK(failureReason:Error adding/updating listener(s) egress: malformed IP address: 2600:f0f0:0:0:0:0:0:1) for resource(type:MetaType(identity=envoy.listener), version:, nonce:488d0f4a-425a-467c-99ac-1ddd0a0e9543, names:[]) from Envoy(version:v1.27.0.0)
zuercher commented 11 months ago

I think we need more information to determine whether this anything that we'd want to address. Do you know what OS, version, etc was in use?

suniltheta commented 11 months ago

OS: AL2023 (based on Fedora Linux distribution), kernel 6.1.52-71.125.amzn2023.x86_64

It was using AWS ECS networking in Bridge mode - I am not sure if this network mode would influence filter_chain_match IPs.

zuercher commented 11 months ago

Looking at the code it seems this must be getaddrinfo returning an error. That's about as far as I can get with the time I have.

suniltheta commented 11 months ago

Thanks for taking a look.

If this getaddrinfo functionality is provided by glibc, want to understand when Envoy does Api::OsSysCallsSingleton::get().getaddrinfo( call, would it be handled with some library already include with Envoy static binary or would Envoy make call to underlying OS and the glibc functionality has to be supported in the OS itself?

In the above custom Envoy image that is used the binary & the image is built on AmazonLinux2 (CentOS flavor). But the container is run on AmazonLinux2023 host which is Fedora based. So I am wonder if there is some incompatibility that would cause this problem (rarely).

zuercher commented 11 months ago

From my experience, glibc is the one library that Envoy links dynamically. If the OS version is older than the compile version it'll fail to start.