envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.99k stars 4.81k forks source link

Delta ECDS holds envoy from being ready #30379

Closed hanxiaop closed 10 months ago

hanxiaop commented 1 year ago

If you are reporting any crash or any potential security issue, do not open an issue in this repo. Please report the issue via emailing envoy-security@googlegroups.com where the issue will be triaged appropriately.

Title: One line description

When initializing the envoy, I have some configs along with the delta ecds config:

 istioctl pc ecds httpbin-65975d4c6f-lrzzw -ojson
{
    "ecdsFilters":  [
        {
            "versionInfo":  "2023-10-12T07:00:28Z/4",
            "ecdsFilter":  {
                "@type":  "type.googleapis.com/envoy.config.core.v3.TypedExtensionConfig",
                "name":  "default.hello",
                "typedConfig":  {
                    "@type":  "type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm",
                    "config":  {
                        "name":  "default.hello",
                        "rootId":  "add_header",
                        "vmConfig":  {
                            "runtime":  "envoy.wasm.runtime.v8",
                            "code":  {
                                "local":  {
                                    "filename":  "/var/lib/istio/data/62824cbf8366e3098287bfabcae120ad0b2d828d0a7b4727291c1f0273982a4a/925429df5ad979b3e8a3742d7d6e4a5d3ceaeacdda4465c57dc42849f3594bd5.wasm"
                                }
                            }
                        },
                        "configuration":  {}
                    }
                }
            },
            "lastUpdated":  "2023-10-12T07:00:33.860Z"
        }
    ]
}

the envoy is always in the initializing state without being ready. Also to note: some of the config in my initial ecds config is missing.

Description:

What issue is being seen? Describe what should be happening instead of the bug, for example: Envoy should not crash, the expected value isn't returned, etc.

Envoy should not crash, should be in the ready state with the correct ecds config.

Repro steps:

Include sample requests, environment, etc. All data and inputs required to reproduce the bug.

I'm using Istio to configure envoy, basically the steps are to have all the config with delta ecds, and when all the configs are sent to envoy, the envoy is always in the initializing state without being ready.

Note: The Envoy_collect tool gathers a tarball with debug logs, config and the following admin endpoints: /stats, /clusters and /server_info. Please note if there are privacy concerns, sanitize the data prior to sharing the tarball/pasting.

Admin and Stats Output:

Include the admin output for the following endpoints: /stats, /clusters, /routes, /server_info. For more information, refer to the admin endpoint documentation.

Note: If there are privacy concerns, sanitize the data prior to sharing.

Config:

Include the config used to configure Envoy.

Other configs are shown in the config dump I provided, and the correct ecds configis :

 istioctl pc ecds httpbin-65975d4c6f-lrzzw -ojson
{
    "ecdsFilters":  [
        {
            "versionInfo":  "2023-10-12T07:00:28Z/4",
            "ecdsFilter":  {
                "@type":  "type.googleapis.com/envoy.config.core.v3.TypedExtensionConfig",
                "name":  "default.hello",
                "typedConfig":  {
                    "@type":  "type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm",
                    "config":  {
                        "name":  "default.hello",
                        "rootId":  "add_header",
                        "vmConfig":  {
                            "runtime":  "envoy.wasm.runtime.v8",
                            "code":  {
                                "local":  {
                                    "filename":  "/var/lib/istio/data/62824cbf8366e3098287bfabcae120ad0b2d828d0a7b4727291c1f0273982a4a/925429df5ad979b3e8a3742d7d6e4a5d3ceaeacdda4465c57dc42849f3594bd5.wasm"
                                }
                            }
                        },
                        "configuration":  {}
                    }
                }
            },
            "lastUpdated":  "2023-10-12T07:00:33.860Z"
        }
    ]
}

The actual ecds config has some fields missing:

{
   "@type": "type.googleapis.com/envoy.admin.v3.EcdsConfigDump",
   "ecds_filters": [
    {
     "ecds_filter": {
      "@type": "type.googleapis.com/envoy.config.core.v3.TypedExtensionConfig",
      "name": "bookinfo.hello"
     },
     "last_updated": "2023-10-13T08:47:48.066Z"
    }
   ]
  },

The init config is:

{
"unready_targets_dumps": [
{
"name": "init manager Listener-local-init-manager 89c6faa8-a2f1-448a-afba-72988d536429 2200412303323231031"
},
{
"name": "init manager Listener-local-init-manager 6d3fca7c-d1dc-47fe-801a-0188e8c7fd79 9872086206848846795"
},
{
"name": "init manager Listener-local-init-manager connect_terminate 13400082664458833867"
},
{
"name": "init manager Listener-local-init-manager main_internal 13582092048675651674",
"target_names": [
"shared target FilterConfigSubscription init bookinfo.hello"
]
},
{
"name": "init manager Listener-local-init-manager connect_originate 10550062674463787435"
}
]
}

configdump.json

Logs:

Include the access logs and the Envoy logs.

wp.log

Note: If there are privacy con cerns, sanitize the data prior to sharing.

Call Stack:

If the Envoy binary is crashing, a call stack is required. Please refer to the Bazel Stack trace documentation.

alyssawilk commented 1 year ago

cc @adisuissa @htuch

adisuissa commented 1 year ago

Thanks for filing this bug. Some clarifying questions:

  1. Is it correct to assume that this doesn't happen if the same filter-config is statically defined?
  2. Is it possible to test this with State-of-the-World? - FWIW, State-of-the-world and delta shouldn't be an issue in this case, but I just want to cover all the use-cases.
hanxiaop commented 1 year ago

@adisuissa Sorry for the late reply.

  1. Is it correct to assume that this doesn't happen if the same filter-config is statically defined?

This is correct.

  1. Is it possible to test this with State-of-the-World? - FWIW, State-of-the-world and delta shouldn't be an issue in this case, but I just want to cover all the use-cases.

I think State-of-the-World works in this case. Completely removing delta makes envoy work as expected. The situation I observed is as follows: if we have envoy ready and later push with a delta ECDS, envoy works fine. However, if envoy is in an initializing state and we push a delta ECDS, the error I mentioned occurs. If it doesn't make sense, I can provide more test results/details.

kyessenov commented 1 year ago

This might be a bug in delta client for ECDS if I recall. It only handles a single add/remove I think. Do you have trace/debug logs from Envoy to see if ECDS is rejected for some reason?

hanxiaop commented 1 year ago

I've attached the trace log.

wptrace.txt

BTW from the init config dump it shows:

{
"unready_targets_dumps": [
...
{
"name": "init manager Listener-local-init-manager main_internal 9736802669137004450",
"target_names": [
"shared target FilterConfigSubscription init bookinfo.hello"
]
}
]
}

Not sure if it's related.

hanxiaop commented 1 year ago

@kyessenov I've tested again, and it affects both istio sidecar and waypoint when delta xDS is enabled. If using delta xDS without ECDS, everything works smoothly. However, if sending a delta ECDS config along with other delta configs before envoy is ready, envoy will never reach a ready state, and if sending a delta ECDS config after envoy is ready, envoy only contains the name, without the vm_config:

{
"@type": "type.googleapis.com/envoy.admin.v3.EcdsConfigDump",
"ecds_filters": [
{
"ecds_filter": {
"@type": "type.googleapis.com/envoy.config.core.v3.TypedExtensionConfig",
"name": "default.hello"
},
"last_updated": "2023-11-10T03:26:48.882Z"
}
]
},

it's a config still not wroking as expected.

BTW stow doesn't have this issue, and everything works good.

github-actions[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] commented 10 months ago

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

howardjohn commented 9 months ago

@hanxiaop is this still an issue

hanxiaop commented 9 months ago

@hanxiaop is this still an issue

@howardjohn Will try tomorrow to see what is happening.