envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.74k stars 4.75k forks source link

SDS onConfigUpdate ignores resource version #7676

Open mosesn opened 5 years ago

mosesn commented 5 years ago

Description: I’m setting up SDS over REST-JSON and we’re having trouble getting envoy to acknowledge that a cert has changed when we send back a message to try to indicate it should re-read the certs. We're sending a Secret-type resource that we're configuring like this:

    Secret.newBuilder()
      .setName(resourceName)
      .setTlsCertificate(TlsCertificate.newBuilder()
        .setCertificateChain(DataSource.newBuilder().setFilename(certPath))
        .setPrivateKey(DataSource.newBuilder().setFilename(privateKeyPath))
    ).build()

And since the certPath and privateKeyPath don't change (only the contents have changed), the resource is the same as before.

I noticed that I’m not seeing this log message: "ENVOY_LOG(debug, "Secret is updated.");" from https://github.com/envoyproxy/envoy/blob/v1.10.0/source/extensions/transport_sockets/tls/ssl_socket.cc#L495 when it replies, even though I'm running at debug log level.

I would expect the /certs endpoint to reflect that the cert has changed, and I would expect it to use the new cert when trying to talk to the upstream (I haven't checked whether it also behaves this way for the downstream). Neither of those seems to be happening.

We're using envoy v1.10.0

Repro steps: Spin up an envoy service with a config file pointing to an SDS service speaking REST-JSON. Change the cert and key that it's using for an upstream, and reply with a Secret that tells it to inspect the same path it was using before.

Admin and Stats Output:

$ curl -s localhost:9900/stats | grep sds
cluster.$CLUSTER_NAME.client_ssl_socket_factory.ssl_context_update_by_sds: 1
listener.0.0.0.0_$PORT.server_ssl_socket_factory.ssl_context_update_by_sds: 1

Config:

          "tls_certificate_sds_secret_configs": [{
            "name": "$RESOURCE_NAME",
            "sds_config": {
              "api_config_source": {
                "api_type": "REST",
                "cluster_names": ["$CLUSTER_NAME"],
                "refresh_delay": "0.001s",
                "request_timeout": "86400s"
              }
            }
          }]

Logs:

[2019-07-22 15:12:28.161][162526][debug][router] [external/envoy/source/common/router/router.cc:717] [C0][S17316378957337769216] upstream headers complete: end_stream=false
[2019-07-22 15:12:28.161][162526][debug][http] [external/envoy/source/common/http/async_client_impl.cc:94] async http request response headers (end_stream=false):
':status', '200'
'content-length', '454'
'x-envoy-upstream-service-time', '352260'

[2019-07-22 15:12:28.161][162526][debug][client] [external/envoy/source/common/http/codec_client.cc:95] [C1] response complete
[2019-07-22 15:12:28.161][162526][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:202] [C1] response complete
[2019-07-22 15:12:28.161][162526][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:240] [C1] moving to ready
[2019-07-22 15:12:28.163][162526][debug][config] [bazel-out/k8-fastbuild/bin/external/envoy/source/common/config/_virtual_includes/http_subscription_lib/common/config/http_subscription_impl.h:78] Sending REST request for /v2/discovery:secrets
[2019-07-22 15:12:28.163][162526][debug][router] [external/envoy/source/common/router/router.cc:320] [C0][S16990936995685964819] cluster 'zerofx_helper' match for URL '/v2/discovery:secrets'
[2019-07-22 15:12:28.163][162526][debug][router] [external/envoy/source/common/router/router.cc:381] [C0][S16990936995685964819] router decoding headers:
':method', 'POST'
':path', '/v2/discovery:secrets'
':authority', 'zerofx_helper'
':scheme', 'http'
'content-type', 'application/json'
'content-length', '198'
'x-envoy-internal', 'true'
'x-forwarded-for', '10.70.198.116'
'x-envoy-expected-rq-timeout-ms', '86400000'

[2019-07-22 15:12:28.163][162526][debug][pool] [external/envoy/source/common/http/http1/conn_pool.cc:97] [C1] using existing connection
[2019-07-22 15:12:28.163][162526][debug][router] [external/envoy/source/common/router/router.cc:1165] [C0][S16990936995685964819] pool ready
[2019-07-22 15:12:29.989][162526][debug][client] [external/envoy/source/common/http/codec_client.cc:95] [C7] response complete
[2019-07-22 15:12:29.989][162526][debug][hc] [external/envoy/source/common/upstream/health_checker_impl.cc:226] [C7] hc response=200 health_flags=healthy
[2019-07-22 15:12:29.989][162526][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:577] [C7] stream closed: 0
[2019-07-22 15:12:29.992][162526][debug][client] [external/envoy/source/common/http/codec_client.cc:95] [C8] response complete
[2019-07-22 15:12:29.992][162526][debug][hc] [external/envoy/source/common/upstream/health_checker_impl.cc:226] [C8] hc response=200 health_flags=healthy
[2019-07-22 15:12:29.992][162526][debug][http2] [external/envoy/source/common/http/http2/codec_impl.cc:577] [C8] stream closed: 0
[2019-07-22 15:12:29.992][162526][debug][client] [external/envoy/source/common/http/codec_client.cc:95] [C10] response complete
[2019-07-22 15:12:29.992][162526][debug][hc] [external/envoy/source/common/upstream/health_checker_impl.cc:226] [C10] hc response=200 health_flags=healthy

Edit: I tried building it in a non-long-polled fashion and it still exhibits the error, so I'm removing all mentions of long-polling from my description. In the config where it says the refresh delay is 1ms, I also tried with a refresh delay of 5s when I'm doing short-polling and it exhibits the same behavior.

mattklein123 commented 5 years ago

cc @lizan

lizan commented 5 years ago

Thanks for the report, I believe this is bug, cc @JimmyCYJ @incfly @PiotrSikora

SdsApi compares protobuf content of the response from SDS, but ignore the resource version, so there is no way to trigger config update if the Secret is exactly same.

incfly commented 5 years ago

+1 a bug.

So the tls_certificate is specified via DataSource.filename, the actual file blob has changed, but we don't bother to re-read. resource version here should be used to indicate the updates and sds is supposed to follow other xds.

JimmyCYJ commented 5 years ago

Thanks for the report. Originally SDS is running on top of gRPC subscription. I guess SDS is using filesystem subscription in this case. We need to add resource version into secret update.

lizan commented 5 years ago

@JimmyCYJ This happens regardless of gRPC subscription or not, when the DataSource has same filename, SDS won't re-read and no way to trigger it to read (which resource version should trigger)

mosesn commented 5 years ago

When you refer to resource version does that mean the version_info in the DiscoveryResponse? Is the idea that I should increment it every time I want to provide an update?

lizan commented 5 years ago

@mosesn yes, and also version in Resource https://github.com/envoyproxy/envoy/blob/master/api/envoy/api/v2/discovery.proto#L226

mosesn commented 5 years ago

Got it. But Resource is only if I use the DeltaDiscovery API, which is only relevant in the incremental xDS case.

In the mean time, it sounds like I can get around this issue by supplying the actual bytes instead of a path? Is that correct?

incfly commented 5 years ago

In the mean time, it sounds like I can get around this issue by supplying the actual bytes instead of a path? Is that correct?

That's correct, if you make your management server push update with inline_string, the hash will be different and trigger a update. We use that in Istio.