envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.89k stars 4.79k forks source link

Configuring an ApiConfigSource on a dynamic cluster #12954

Open bigdefect opened 4 years ago

bigdefect commented 4 years ago

Title: Is it possible to configure ApiConfigSource to point to a cluster from CDS?

Description:

We'd like to specify a STATIC cluster that points to a unix domain socket for SDS, but deliver it via xds. While setting up a POC, we came across an error that suggests the above isn't possible.

Given this cluster for the unix socket:

"dynamic_active_clusters": [
  "cluster": {
    "@type": "type.googleapis.com/envoy.api.v2.Cluster",
    "name": "static_uds_cluster",
    "type": "STATIC",
    "load_assignment": {
     "cluster_name": "static_uds_cluster",
     "endpoints": [
      {
       "lb_endpoints": [
        {
         "endpoint": {
          "address": {
           "pipe": {
            "path": "/path/to/agent.sock"
           }
          }
...

And this listener (for example) that points to the above cluster for SDS:

"@type": "type.googleapis.com/envoy.api.v2.Listener",
 ...
 "filter_chains": [
  {
   ...
   "tls_context": {
    "common_tls_context": {
     "tls_certificate_sds_secret_configs": [
      {
       "name": "secret_name",
       "sds_config": {
        "api_config_source": {
         "api_type": "GRPC",
         "grpc_services": [
          {
           "envoy_grpc": {
            "cluster_name": "static_uds_cluster"
           }
          }
          ...

This leads to

envoy.config.core.v3.ApiConfigSource must have a statically defined non-EDS cluster: 'static_uds_cluster' does not exist, was added via api, or is an EDS cluster

That error is thrown here: https://github.com/envoyproxy/envoy/blob/release/v1.15/source/common/config/utility.cc#L144

And based on the explanation of primary clusters: https://github.com/envoyproxy/envoy/blob/release/v1.15/source/common/upstream/cluster_manager_impl.cc#L274-L280

I'm inclined to come to the conclusion that this is to prevent dependencies across dynamic listeners and clusters. I'd like to verify that this is an intended hard limit and that the cluster must be defined statically via bootstrap.

bigdefect commented 4 years ago

I tried switching it from EnvoyGrpc to GoogleGrpc, and can at least confirm that the config is ACK'd. So remove the UDS cluster and change the sds config to:

"tls_certificate_sds_secret_configs": [
  {
   "name": "spiffe://mesh.com/frontend",
   "sds_config": {
    "api_config_source": {
     "api_type": "GRPC",
     "grpc_services": [
      {
       "google_grpc": {
        "target_uri": "/tmp/agent.sock",
        "stat_prefix": "sdsuds"
       }
      }

I'm not familiar with the difference between the clients.

mattklein123 commented 4 years ago

cc @htuch @adisuissa

htuch commented 4 years ago

I believe this is a limit we introduced to simplify some of the bootstrap dependency issues across cluster initialization and warming. @yanavlasov probably has some of the latest context on that. I think we could relax this constraint in some cases, in particular when the config source is not for cluster endpoints.

kyessenov commented 4 years ago

This is a bummer for federated xDS. Delegation dependency must be declared in the bootstrap.

bigdefect commented 4 years ago

While I'm waiting for a confirmation from @yanavlasov, I'll also point one mildly interesting thing I noticed.

If the config looks something like this:

"sds_config": {
  "api_config_source": {
   "api_type": "GRPC",
   "grpc_services": [
    {
     "envoy_grpc": {
      "cluster_name": "static_uds_cluster"
     }
    }

The validation on the cluster_name is actually performed on

"sds_config": {
  "api_config_source": {
   "cluster_name": [
     "static_uds_cluster"
   ]
   ...

i.e. ConfigSource.ApiConfigSource.GrpcService.EnvoyGrpc.cluster_name versus ConfigSource.ApiConfigSource.cluster_names. I don't know enough about envoy internals but it seems like the cluster names are being hoisted up before validation is performed. I don't know if that has a material impact here.

yanavlasov commented 4 years ago

Sorry for the delay in responding. It's been awhile, but as I recall Envoy does not allow this type of dependencies. In this case, as I understand, it is LDS->SDS->CDS. Conceptually it should be possible to make it work, it just has not been done.

bigdefect commented 4 years ago

Understood, thanks! Do you have any comment on the hoisting behavior I described above? Just out of curiosity.

As for this issue, I think I have my answer, we'll need to either pursue the static bootstrap cluster or use the GoogleGrpc client. If it makes sense to keep this open as a feature request, feel free, otherwise I'm good.

htuch commented 4 years ago

Re: https://github.com/envoyproxy/envoy/issues/12954#issuecomment-692207481, the top-level cluster_names only applies to REST (see comment on the field). It should prefer the gRPC cluster name if you skip configuring top-level cluster_names; I'd say this is a bug and we should opt for gRPC cluster name preferentially if present.

Re: the original issue. I think this should remain open as a feature request. It's definitely valid to go ahead and relax this constraint in almost all situations. Where you run into trouble is:

For LDS->SDS->CDS->EDS, I think this would work today with the check removed, but it would also have broken warming behavior, as AFAIK we don't have the init manager warming plumbing in place.

For now I suggest you pursue the workarounds you have identified.

radnair commented 2 years ago

I tried switching it from EnvoyGrpc to GoogleGrpc, and can at least confirm that the config is ACK'd. So remove the UDS cluster and change the sds config to:

"tls_certificate_sds_secret_configs": [
  {
   "name": "spiffe://mesh.com/frontend",
   "sds_config": {
    "api_config_source": {
     "api_type": "GRPC",
     "grpc_services": [
      {
       "google_grpc": {
        "target_uri": "/tmp/agent.sock",
        "stat_prefix": "sdsuds"
       }
      }

I'm not familiar with the difference between the clients.

I am facing something similar right now, would you know if there are any other changes that need to made to get this working?

ramaraochavali commented 7 hours ago

Can we reopen this? This is still a problem for SDS

ramaraochavali commented 3 hours ago

but it would also have broken warming behavior, as AFAIK we don't have the init manager warming plumbing in place.

@htuch @adisuissa with this https://github.com/envoyproxy/envoy/pull/21600 - Do we still need to worry about warming? I am trying to relax this for Secrets as that seems to be the most common case first to see how it works in https://github.com/envoyproxy/envoy/pull/36694