envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
25k stars 4.81k forks source link

TLS bumping: decrypting communications between internal and external services #18928

Open LuyaoZhong opened 3 years ago

LuyaoZhong commented 3 years ago

TLS Bumping in Envoy Design Doc

2022.10.31

PoC: https://github.com/envoyproxy/envoy/pull/23192 README and configurations are in tls_bumping subdir


2022.07.13 4 work items were worked out.

  1. Certificate Provider framework https://github.com/envoyproxy/envoy/issues/19308 https://github.com/envoyproxy/envoy/pull/19582
  2. SNI-based cert selection in tls transport socket https://github.com/envoyproxy/envoy/issues/21739 https://github.com/envoyproxy/envoy/pull/22036
  3. A new network filter - BumpingFilter https://github.com/envoyproxy/envoy/issues/22581 https://github.com/envoyproxy/envoy/pull/22582
  4. Certificate Provider instance - LocalMimicCertProvider https://github.com/envoyproxy/envoy/pull/23063

    2022.04.24 update

Mimicking certs only based on SNI is probably not enough, we require server real certificate and ensure to copy subject, subject alt name, extensions, knowing about the RSA key strength and many more. Original proposal was to set up client-first secure connection, to meet above requirements we need server-first secure connection.

Therefore, we expect the workflow like this:

  1. downstream requires accessing some external websites like "google.com", the traffic is routed to Envoy
  2. Envoy receive the CLIENT_HELLO but don't do handshake with downstream until step5
  3. Envoy connects "google.com" (upstream) and get real server certificate
  4. Envoy copies the subject, subject alt name, extensions, etc from real server certificate and generates mimic certificate
  5. Envoy does TLS handshake with downstream using mimic certificate
  6. traffic is decrypted and go through Envoy network filters, especially HCM, there are many http filters and user can also expand http filter easily with WASM to plugin in many security functions.
  7. traffic is encrypted and sent to upstream.

Original Proposal

Title: decrypting communications between internal and external services

Description:

When Envoy works as sidecar or egress gateway in service mesh, Istio takes responsibility of certification generation and pushing the configs to Envoy via xDS. But when it works like typical proxy, the internal services on the edge might access many different external websites such as Google or Bing etc, Envoy does't provide the ability to terminate this kind of TLS traffic. For this scenario, we propose a method to let Envoy generate certs dynamically and do TLS handshake. Then if the client trusts the root ca that the certs signed from, it can access external services under the control of Envoy.

Changes (straw man)

  1. introduce an API to enable this feature and configure ca crt and key for signing
  2. get sni from tls inpector (we need sni to generate certs, just utilize tls inspector, probably no changes)
  3. generate certs according to sni
  4. set the certs to SSL object and then do handshake

Any comments are welcome.

rojkov commented 2 years ago

/cc @lizan @asraa @ggreenway

ggreenway commented 2 years ago

Can you please elaborate on the desired traffic flow (client envoy's possition, server, which connections are TLS vs plaintext)?

lambdai commented 2 years ago

I am curious what kind of cert is needed for the google/bing access.

If the upstream is google/bing, envoy doesn't terminate tls but initiate tls.

The straw man flow confuses me: is the cert applied in downstream connection or upstream connection?

LuyaoZhong commented 2 years ago

@ggreenway @lambdai The desired traffic flow is like this: <downstream/internal service> ---- TLS(mimic cert generated by Envoy) ---- < Envoy> ---- TLS ---- <upstream/external service>

I mean envoy needs to terminate downstream TLS first, then we can apply many filters to control internal service accessing external network, and after that envoy initiates TLS to upstream. The mimic cert will be applied to downstream connection. There is no change to upstream connection. I'm not sure if I was using a proper word "terminate", if not please correct me.

Thanks for your comments.

ggreenway commented 2 years ago

Ok, I think I understand now. Let me paraphrase to make sure I understand: you'd like for envoy to have a CA cert/key, trusted by the downstream client, and for envoy to dynamically generate a TLS cert signed by the CA cert/key for whatever name is in the SNI of a connection?

LuyaoZhong commented 2 years ago

@ggreenway Yes, exactly. Does it make sense for you?

LuyaoZhong commented 2 years ago

I wrote some PoC code for dynamically generating cert, and I tested the downstream TLS handshake using the mimic cert.

For API change, envoy currently requires certs(static or sds) set in config yaml file, and the code path doesn't take the case I mentioned into consideration. To support this feature I need a proper API introduced to indicate we will do TLS handshake using dynamic cert . I would like you could help me on this new API definition, I'm thinking about adding "tls_root_certificates" to CommonTlsContext, and it is only valid when the commonTlsContext is part of DownstreamTlsContext:

[extensions.transport_sockets.tls.v3.CommonTlsContext]

{ "tls_params": "{...}", "tls_certificates": [], "tls_root_certificates": [], "tls_certificate_sds_secret_configs": [], "validation_context": "{...}", "validation_context_sds_secret_config": "{...}", "combined_validation_context": "{...}", "alpn_protocols": [], "custom_handshaker": "{...}" }

[extensions.transport_sockets.tls.v3.TlsRootCertificate]

{ "root_ca_cert": "{...}", "root_ca_key": "{...}" }

Do you think it is reasonable?

ggreenway commented 2 years ago

I think a more general approach would be to implement this as a listener filter. It could either run after tls_inspector (which reads the SNI value), or re-implement that part. It can then generate the needed cert, and we can add an API for a listener filter to signal to the TLS transport_socket which certificate to use.

There have been other feature requests to support extremely large numbers of fixed/pre-generated certs and to choose the correct one at runtime, and this implementation could support that use case as well.

Does that sound workable to you?

LuyaoZhong commented 2 years ago

Generating certs in a listener filter sounds workable. But an API for a listener filter might not be enough, the old DownstreamTlsContext still requires user setting tls certificates, we can't avoid touching DownstreamTlsContext or its sub apis.

ggreenway commented 2 years ago

I think we could add a FilterState from the listener filter which contains the cert/key to use, and have SslSocket check for it's presence and set the cert on the SSL* (not SSL_CTX*).

LuyaoZhong commented 2 years ago

Yes, it's SSL*(not SSL_CTX).

Let me list several questions and answers to make the design clear:

  1. Where to generate certs? After deliberation, I think tls_inspector is not a good place for generating certs, because we don't want dynamically generating certs for all SNI, we want tls_inspector to detect SNI first, then dispatch to different filterchain according to SNI. This will be more flexible, since we can have different certs conifg policy for different filterchain, static, sds or dynamic. In my PoC I generate the certs in SslSocket::setTransportSocketCallbacks [1].

[1] https://github.com/envoyproxy/envoy/blob/3da250c6759ed9d2698e4e626fe1146cf696c316/source/extensions/transport_sockets/tls/ssl_socket.cc#L65

  1. Why we can't avoid touching DownstreamTlsContext API. [2] shows Envoy requiring user setting tls certificates otherwise it exits during bootstrap. I went through some code, a easy way is to introduce an API to indicate it has the capability to provide certificates[3].

[2]https://github.com/envoyproxy/envoy/blob/9cc74781d818aaa58b9cca9602fe8dc62181d27b/source/extensions/transport_sockets/tls/context_config_impl.cc#L411 [3]https://github.com/envoyproxy/envoy/blob/9cc74781d818aaa58b9cca9602fe8dc62181d27b/source/extensions/transport_sockets/tls/context_config_impl.cc#L408

  1. Where to set CA cert/key? Since we have to modify DownstreamTlsContext(2nd question), I prefer it's for per transportsocket but not per listener, what do you think?
lizan commented 2 years ago
  1. Where to generate certs? After deliberation, I think tls_inspector is not a good place for generating certs, because we don't want dynamically generating certs for all SNI, we want tls_inspector to detect SNI first, then dispatch to different filterchain according to SNI. This will be more flexible, since we can have different certs conifg policy for different filterchain, static, sds or dynamic.

Yeah this all makes sense, having generating part in transport socket sounds reasonable to me. We might need a cache to store generated cert so they aren't generated for every connection.

lambdai commented 2 years ago

Perhaps SDS should be acts as that counted cache. RDS/ECDS/EDS maintains the N:1 mapping (N subscription 1 config) and it's not surprising to introduce to SDS.

@LuyaoZhong My understanding is that your POC is generating CSR, if this functionality can be moved to SDS, some of the SDS server could be leveraged

LuyaoZhong commented 2 years ago

@lizan @lambdai Thanks for your comments. A cache sounds good. SDS could be one option to cache the dynamic certs, we are supposed to support both local cache and SDS, right? If so, I want to start with local cache and then introduce SDS later. Does it make sense for you?

LuyaoZhong commented 2 years ago

I investigated the API, related classes and workflow, and completed the first version of code, see https://github.com/envoyproxy/envoy/pull/19137.

In this code version, we have done:

  1. introduce an API to set root CA cert/key
    common_tls_context:
    tls_root_ca_certificate:
       cert: {"filename": "root-ca.pem"}
       private_key: {"filename": "root-ca.key"}
  2. implement a local cache to store generated certs pair
  3. Generate/reuse dynamic certificates pair in TLS transport socket and set SSL* a. if there is no corresponding cached certs, create certs signed from root CA, then store the generated certs to local cache b. if there is corresponding cached certs, reuse them according to host name

I'll split the patch, polish the code, reword the original proposal description after some design details settle down. Could you help review the design items I listed above. What's your suggestion?

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

LuyaoZhong commented 2 years ago

This is no stale.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] commented 2 years ago

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

LuyaoZhong commented 2 years ago

@lizan @ggreenway could you help label this as "no statebot" Previous design and PoC is mimicing certificate and doing TLS handshake with downstream before connecting upstream, is it possible to support the following workflow in Envoy?

  1. downstream requires accessing some external websites like "https://www.google.com"
  2. Envoy receive the CLIENT_HELLO but don't do handshake with downstream until step5
  3. Envoy connects "https://www.google.com"(upstream) and get real server certificate
  4. Envoy copies the subject, subject alt name, extensions, etc from real server certificate and generates mimic certificate
  5. Envoy does TLS handshake with downstream using mimic certificate
  6. traffic is decrypted and go through Envoy network filters.
ggreenway commented 2 years ago

You could achieve this by writing a listener filter. I think you could put it in the listener filters after the existing tls inspector. At this point the SNI is known, so you could so whatever asynchronous work you need to do, and when it's finished, allow the filter chain to continue.

lambdai commented 2 years ago

Reading the below, I think the domain name is captured by envoy via SNI? I want to add that the http host in plain text is also supported.

  • Envoy receive the CLIENT_HELLO but don't do handshake with downstream until step5

  • Envoy connects "https://www.google.com"(upstream) and get real server certificate

AFAIK there is not many envoy components you could you under the context of listener filter. But it's definitely possible.

  • Envoy copies the subject, subject alt name, extensions, etc from real server certificate and generates mimic certificate
  • Envoy does TLS handshake with downstream using mimic certificate

With #19582, right? It seems a TLS transport socket need to generated on demand along with a mimic cert provider. That could be something new, or continue the work on creating an ondemand network filter(which includes a transport socket)

  • traffic is decrypted and go through Envoy network filters.
LuyaoZhong commented 2 years ago

@ggreenway @lambdai Will it cause two connections to upstream for one request? Since we need to connect upstream and get server certificate back first. And after the traffic is decrypted and goes though network filters, we connect upstream again to transfer traffic data. It is possible to reused the first connection?

Besides, #19582 is a extension in current TLS transport socket, if implementing the mimicking inside listener filter it seems I need to implement TLS transport socket inside listener filter otherwise I can not reuse #19582. Does it make sense to integrating a transport socket inside listener filter?

@lambdai could you provide more details about "there is not many envoy components in listener filter". How big is the gap?

lambdai commented 2 years ago

re: connect upstream and get server certificate

Can this job achieved as part of the cert provider bootstrap(or another extension)? If so, you only need ref the new component in the listener filter and register a resume path to drive the listener filter on cert fetched

LuyaoZhong commented 2 years ago

add @liverbirdkte

LuyaoZhong commented 2 years ago

@lambdai @ggreenway

re: connect upstream and get server certificate

Can this job achieved as part of the cert provider bootstrap(or another extension)? If so, you only need ref the new component in the listener filter and register a resume path to drive the listener filter on cert fetched

It sounds moving the implementation from listener filter to cert provider, while cert provider has less components than listener filter and it's not ready for now. Which design do you prefer, cert provider + listener filter, or a stand-along listener filter?

We have to connect upstream in downstream subsystem of Envoy. Besides, we need to store that socket to somewhere and use it when transforming data to upstream, Otherwise AFAIK Envoy will try to create a new connection to upstream. Is there any risk to implement this?

LuyaoZhong commented 2 years ago

@ggreenway @lambdai

HCM is designed as a terminal filter, it brings a lot of limitation in our case. We want this feature could work with HCM. After the traffic is decrypted, we make the data go through HCM, users could plug in many security functions by extending http filters, therefore the traffic to external is monitored.

Current HCM sets up connection with Upstream in http router filter after receiving http headers, if we implement another listener filter or network filter or any other component to get server cert, how to make HCM reuse the connection is the problem we need to address. It seems not easy, do you have any suggestion?

We came up with another idea. What about making HCM implementing both ReadFilter and WriteFilter and worked as a non-terminal filter? We need to decouple the request and response processing. onData (ReadFilter) corresponds to request path, onWrite(WriteFilter) corresponds to response path. HCM will not connect upstream, to get server cert, send and receive data from upstream, we need a terminal filter like tcp proxy at the end of network filter chain. Does it make sense?

LuyaoZhong commented 2 years ago

@ggreenway @lambdai ping

ggreenway commented 2 years ago

I don't understand what you're trying to accomplish. Are you trying to make sure that downstream connections re-use the same upstream connection?

Changing HCM to a non-terminal filter does not seem like a viable approach.

lambdai commented 2 years ago

Sorry, I don't fully understand your intention. I sincerely think you need a "better"(in term of reuse and consuming RDS) http async client to fetch the cert.

The HCM as a network filter is over kill because you need to feed the data to HCM and drain.

Since I don't know how good the current http async client is, you can always use the current http async client from an internal cluster. That internal cluster contains the internal address. Meanwhile you can create an internal listener "listening" on that address, and that listener contains your desired HCM which can consume RDS and use any upstream cluster type.

LuyaoZhong commented 2 years ago

@ggreenway @lambdai

This feature support needs much more work than we were imaging. To avoid making you confused. I draft a document to provide your context. It's mostly text for now, I could add more diagrams later to help you understand it if you like. Could you leave your comments there?

TLS Bumping in Envoy

Why was I proposing a non-terminal HCM? In our case we need to connect upstream after receiving downstream TLS CLIENT HELLO, so we need to set up secure connection with upstream via tcp connection pool. After that, we will receive http packets, HCM will try to set up a new connection with upstream via http connection pool, but we want HCM could reuse the pre-established upstream connection. It seems not easy to implement it. And in our case, the http routes match is not useful, we connects dynamic cluster based SNI but not http url, so the idea of making HCM a non-terminal filter came out, non-terminal HCM only cares about data processing, we can add a terminal filter at the end of filter chain to send/receive data to/from upstream.

rohrit commented 2 years ago

I have the same use case too. I was wondering whether an approach using custom_handshaker extension can offer a viable solution for this use case. I have not dug into the details yet and wanted to check.

@ggreenway , this was discussed in https://github.com/envoyproxy/envoy/issues/20708 and was wondering whether using custom_handshaker offers a way to solve this use case.

Also, from what I understood the custom_handshaker implementation has to be compiled into Envoy rather than being injected/loaded into Envoy like a Lua/WASM network filter extension.

soulxu commented 2 years ago

@LuyaoZhong Sorry for joining the discussion late, I'm not sure I loaded enough the background here, but I have question for your current solution and thinking of another solution here.

First question, why do we need to reuse the connection here? I thought only the first downstream connection to the specific host need to fetch the upstream cert, then the envoy already has enough info for the follow-up downstream connection to create mimic cert. So it means it is ok for a separate connection just for fetching the cert info when first downstream connection.

The second question is why we need to modify HCM, I suppose that only works for HTTP, which will drop the use-case for TCP.

I'm not sure I'm thinking right, but I did think about using the dynamic forward proxy, since I feel it is very similar with what sni_dynamic_forward_proxy filter doing here, the different is fetching upstream cert info, not the DNS query.

How about using the sni_dynamic_forward_proxy network filter and envoy.clusters.dynamic_forward_proxy cluster here. Something like this:

static_resources:
  listeners:
  - name: my_listener
    address:
      socket_address:
        protocol: TCP
        address: 0.0.0.0
        port_value: 13333
    listener_filters:
    - name: envoy.filters.listener.tls_inspector
    filter_chains:
    - filters:
      - name: envoy.filters.network.sni_dynamic_forward_proxy
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.sni_dynamic_forward_proxy.v3.FilterConfig
          port_value: 443
          dns_cache_config:
            name: dynamic_forward_proxy_cache_config
            dns_lookup_family: V4_ONLY
      - name: envoy.tcp_proxy
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
          stat_prefix: ingress_tcp
          cluster: my_cluster
  clusters:
  - name: my_cluster
    lb_policy: CLUSTER_PROVIDED
    cluster_type:
      name: envoy.clusters.dynamic_forward_proxy
      typed_config:
        "@type": type.googleapis.com/envoy.extensions.clusters.dynamic_forward_proxy.v3.ClusterConfig
        dns_cache_config:
          name: dynamic_forward_proxy_cache_config
          dns_lookup_family: V4_ONLY

We can add the functionality of fetching upstream cert info into the TCP proxy filter(or a new network filter using it between sni_dynamic_forward_proxy and tcp proxy filter)'s onNewConnection() callback. Before you get enough info, you can keep return StopIteration from the TCP proxy filter, to let the downstream connection hanging there.

The full flow is as below:

  1. The client sends the request to the envoy
  2. TLS inspect filter parses the SNI
  3. the new TCP connection created inside Envoy and create filter chain for the connection https://github.com/envoyproxy/envoy/blob/1938af1d5cf650d2baa391124f1a61c939cab56f/source/server/active_stream_listener_base.cc#L53-L54
  4. the onNewConnection() callback invoked on the network filter (This is invoked before downstream SSL handshake) https://github.com/envoyproxy/envoy/blob/1938af1d5cf650d2baa391124f1a61c939cab56f/source/common/network/filter_manager_impl.cc#L44 https://github.com/envoyproxy/envoy/blob/1938af1d5cf650d2baa391124f1a61c939cab56f/source/common/network/filter_manager_impl.cc#L70
  5. The sni_dynamic_forward_proxy filter 's onNewConnection() callback will query the DNS based on SNI.
  6. When dns ready, dynamic_forward_proxy cluster will create entry for it.
  7. In the Tcp proxy filter's onNewConnection() callback (or we create new network filter), you already can pick the route based on SNI https://github.com/envoyproxy/envoy/blob/1938af1d5cf650d2baa391124f1a61c939cab56f/source/common/tcp_proxy/tcp_proxy.cc#L574 Using the route create an upstream connection, and fetch the upstream cert info, but keep returning StopIteration until you get enough info. (This is similar with sni_dynamic_forward_proxy filter query the DNS and blocking the connection)
  8. Pass the upstream cert info back to the downstream transport socket(Maybe be through the StreamInfo), then return Continue for the Tcp proxy filter's onNewConnection() callback
  9. The downstream transport socket mimics the cert, and does the handshake, then processes the connection as normal. (it will be an new upstream connection)

I little prefer to create a new network filter then add that to the TCP proxy filter, and use it between the sni_dynamic_forward_proxy filter and tcp proxy filter.

But yes, this is a high-level workflow, I believe there still have a lot of details.

cc @lambdai @ggreenway @mattklein123

LuyaoZhong commented 2 years ago

@soulxu Thanks for your comments. We have to make it work with HCM, so we always struggle with how to reuse the upstream connection. Our proposed solution#1 in TLS Bumping in Envoy is similar to your suggestion, the new bumping filter is expected to work like tcp proxy and support getting and caching server cert.

After having discussion with @soulxu and @liverbirdkte on the upstream connection reuse issue, we agree on giving up connection reuse. Because we will support server cert caching, duplicated connections only happens at the first time connecting a new external service, afterwards we use the cache(mimicked cert) in bumping filter to do downstream handshake, and leave the upstream handshake to HCM. Introducing one more connection at the first time connecting upstream is supposed not to cause much performance downgrade. And it also avoids introducing complexity to underlying framework only for addressing the reuse issue.

@ggreenway @lambdai @mattklein123

soulxu commented 2 years ago

@soulxu Thanks for your comments. We have to make it work with HCM, so we always struggle with how to reuse the upstream connection. Our proposed solution#1 in TLS Bumping in Envoy is similar to your suggestion, the new bumping filter is expected to work like tcp proxy and support getting and caching server cert.

Thanks for updating solution#1, I prefer solution#1. It needn't modify HCM, and it gonna be works both for HTTP and TCP.

After having discussion with @soulxu and @liverbirdkte on the upstream connection reuse issue, we agree on giving up connection reuse. Because we will support server cert caching, duplicated connections only happens at the first time connecting a new external service, afterwards we use the cache(mimicked cert) in bumping filter to do downstream handshake, and leave the upstream handshake to HCM. Introducing one more connection at the first time connecting upstream is supposed not to cause much performance downgrade. And it also avoids introducing complexity to underlying framework only for addressing the reuse issue.

Yea, reuse connection doesn't provide too much benefit, and complex the whole thing. Without reuse connection, then we can have solution#1 here. It is more simple.

mattklein123 commented 2 years ago

In general solution #1 seems OK to me with a few caveats: 1) Per discussion I would recommend skipping connection reuse in the initial implementation. It will be much harder and I don't think the extra work is justified. 2) Some thought needs to put into how the signing of the fake certs is going to be done. I'm honestly not thrilled with this being built directly into Envoy. It seems better to potentially have a gRPC API that can be used for this and then organizations can do this however they want since how they manage CA certs, etc. is really out of scope for Envoy. 3) We should minimize changes to the existing systems as much as possible. All functionality should be in new optional extensions, besides for any hooks that are needed to call the new functionality.

LuyaoZhong commented 2 years ago

@mattklein123

In general solution #1 seems OK to me with a few caveats:

  1. Per discussion I would recommend skipping connection reuse in the initial implementation. It will be much harder and I don't think the extra work is justified.

Thanks for your vote on solution#1. We could dive into design details next step along with this direction. It will be very helpful to simplify our implementation.

  1. Some thought needs to put into how the signing of the fake certs is going to be done. I'm honestly not thrilled with this being built directly into Envoy. It seems better to potentially have a gRPC API that can be used for this and then organizations can do this however they want since how they manage CA certs, etc. is really out of scope for Envoy.

Yes, we agree on that it's out of scope for Envoy, we are looking for a proper extension mechanism to support cert mimicking, certificate provider was recommended before, after discussion with @derekguo001 based on your comments on #19217, we will investigate SDS as well and figure out which one is better for our case.

  1. We should minimize changes to the existing systems as much as possible. All functionality should be in new optional extensions, besides for any hooks that are needed to call the new functionality.

Yes, we are trying our best to follow these design principles.

LuyaoZhong commented 2 years ago

A brief summary based on investigation of SDS and Certificate Provider.

Mimicking certs is essentially getting cert resources during runtime. This requires on-demand update mechanism. Envoy acts as an resource initiator upon data plane request.

SDS On-demand mechanism depends on incremental xDS. Current SDS in Envoy does not support on-demand update though incremental xDS protocol is ready on Envoy-side . Istio-agent(sds server) even does not implement incremental xDS protocol at all.

Another concern about SDS is how to carry parameters to SDS server, e.g. the subject, subject alt name, RSA strength, etc, we needs these information to generating cert. DeltaDiscoveryRequest has limited fields, dynamic_parameters in node might be a proper place but it is marked as WIP.

Certificate Provider This is still working in process, we need to consider on-demand support as well. We had some discussion on certificate provider PR. These allows us implement arbitrary provider instance without protocol limitation.

For initial implementation, we propose to continue utilize Certificate Provider, complete the general framework first, and then we could implement a local cert provider service for ease to verify whole bumping functionality. SDS is our backup since there are still a lot of gaps.

What's your suggestion to this proposal? @mattklein123 @lambdai @markdroth @soulxu

Please correct me if I miss or misunderstand anything. Besides, is there any reference/guidance we can follow to design a on-demand mechanism.

LuyaoZhong commented 2 years ago

@mattklein123 @lambdai @markdroth @soulxu ping

mattklein123 commented 2 years ago

What's your suggestion to this proposal? @mattklein123 @lambdai @markdroth @soulxu

I don't have an opinion given the data. I'm not exactly sure what "certificate provider" means in this context and what the API would look like. SDS in general still seems like a better option to me if we can sort out the missing pieces. The lack of on-demand in Istio should not factor into how it's implemented in Envoy.

ggreenway commented 2 years ago

Continuing discussion from #1984:

I believe that a custom Handshaker can provide the server certificate if it's HandshakerFactory sets provides_certificates = true in capabilities(), and then sets SSL_CTX_set_select_certificate_cb() in the handshaker where the selection logic can happen. A custom handshaker is configured via config.core.v3.TypedExtensionConfig custom_handshaker = 13; in the CommonTlsContext.

LuyaoZhong commented 2 years ago

I don't have an opinion given the data. I'm not exactly sure what "certificate provider" means in this context and what the API would look like. SDS in general still seems like a better option to me if we can sort out the missing pieces. The lack of on-demand in Istio should not factor into how it's implemented in Envoy.

@mattklein123

"certificate provider" API is ready, but not implemented yet. I have a WIP PR for it, the updates stop for some time since I focus on whole bumping design recently.

// Indicates a certificate to be obtained from a named CertificateProvider plugin instance.
// The plugin instances are defined in the client's bootstrap file.
// The plugin allows certificates to be fetched/refreshed over the network asynchronously with
// respect to the TLS handshake.
// [#not-implemented-hide:]
message CertificateProviderPluginInstance {
  // Provider instance name. If not present, defaults to "default".
  //
  // Instance names should generally be defined not in terms of the underlying provider
  // implementation (e.g., "file_watcher") but rather in terms of the function of the
  // certificates (e.g., "foo_deployment_identity").
  string instance_name = 1;

  // Opaque name used to specify certificate instances or types. For example, "ROOTCA" to specify
  // a root-certificate (validation context) or "example.com" to specify a certificate for a
  // particular domain. Not all provider instances will actually use this field, so the value
  // defaults to the empty string.
  string certificate_name = 2;
}

message CommonTlsContext {
......
  // Only one of *tls_certificates*, *tls_certificate_sds_secret_configs*,
  // and *tls_certificate_provider_instance* may be used.
  repeated TlsCertificate tls_certificates = 2;

  repeated SdsSecretConfig tls_certificate_sds_secret_configs = 6
      [(validate.rules).repeated = {max_items: 2}];

  // [#not-implemented-hide:]
  CertificateProviderPluginInstance tls_certificate_provider_instance = 14;
......
}

The lack of on-demand in Istio is one of the gaps. Envoy does not support that as well. Besides, each sds config is corresponding to one secret provider in transport socket which can only fetch one single extensions.transport_sockets.tls.v3.TlsCertificate, if we need multiple certificates we will require the control plane to distribute more sds configs, I don't know how to implement this functionality. How to carry information from dataplane to sds server is another problem, these information is used to mimicking certs, but we have request format limitations when use xDS protocol. If we use certificate provider, we can implement an instance providing multiple certificates, and we can implement bumping by registering a resume path to cert provider on cert fetched.

I believe that a custom Handshaker can provide the server certificate if it's HandshakerFactory sets provides_certificates = true in capabilities(), and then sets SSL_CTX_set_select_certificate_cb() in the handshaker where the selection logic can happen. A custom handshaker is configured via config.core.v3.TypedExtensionConfig custom_handshaker = 13; in the CommonTlsContext.

@ggreenway

As I mentioned above, certificate provider should be enough and easy. If we use custom handshaker we still need something like certificate provider inside handshaker. Besides, we want updating the global context config when receiving the certs, not only in handshaker context,otherwise we need mimicking certs every time receive a request. Do we have to set SSL_CTX_set_select_certificate_cb() by customer handshaker? Can we just modify current selection function? My understanding is it's a general feature, consider if we attach several static tls certificates for different SNI in transport socket, or we have several sds configs that fetching multiple certs, we can do SNI-based selection anyway, and fallback to some certificate if not match.

ggreenway commented 2 years ago

I was just giving an example of how to accomplish the task you were asking about in the comments of a very old PR.

I agree that it could make sense to have multiple certs in a single TLS context and have more logic to select them. But I think that's a separate feature from what this issue is tracking. Feel free to open an issue for it to discuss.

mattklein123 commented 2 years ago

Using the certificate provider API for this to make a new extension seems fine. Beyond that you will need to work out the details which as @ggreenway says may involve a separate set of work items to make it easier to allow a handshake extension to select from multiple certs, through a provider, etc. I would recommend creating a document and outlining a very specific set of work items that we can agree on that will accomplish your task.

LuyaoZhong commented 2 years ago

@mattklein123 @ggreenway

Thanks for your comments. I add a "Proposed Changes" section in bumping doc.

As for cert selection, let's go to https://github.com/envoyproxy/envoy/issues/21739 to discuss details.

soulxu commented 2 years ago

I think both SDS (I don't think on-demand SDS should be an issue) and certificate provider work for you. As my understanding, it is always fine with an extension for the non-core use-case, also certificate provider seems already have other usecase ( https://github.com/envoyproxy/envoy/issues/21292). then it becomes reasonable extension point.

The certificate provider you defined https://github.com/envoyproxy/envoy/pull/19582/files#diff-57c305aa5cc3e7196c5c808a13ff7819ab9dd089cabffda303d885dfde43ce13R19 seems strange for me, or maybe I didn't understand that correclty.

I think you needn't define a new custom certificate provider interface. The custom certificate provider should implement the existing interface https://github.com/envoyproxy/envoy/blob/8259b33fea720672835d5c46722f0b97dfd69470/envoy/secret/secret_provider.h#L63-L64

ggreenway commented 2 years ago

@LuyaoZhong I think you're missing a chunk of work in your proposed solution: You will need a way to delay the TLS handshake until you have the cert. This will probably involve a custom handshaker, which will have the integration points with your other code that fetches/generates the cert.

LuyaoZhong commented 2 years ago

I think both SDS (I don't think on-demand SDS should be an issue) and certificate provider work for you. As my understanding, it is always fine with an extension for the non-core use-case, also certificate provider seems already have other usecase ( #21292). then it becomes reasonable extension point.

The certificate provider you defined https://github.com/envoyproxy/envoy/pull/19582/files#diff-57c305aa5cc3e7196c5c808a13ff7819ab9dd089cabffda303d885dfde43ce13R19 seems strange for me, or maybe I didn't understand that correclty.

I think you needn't define a new custom certificate provider interface. The custom certificate provider should implement the existing interface

https://github.com/envoyproxy/envoy/blob/8259b33fea720672835d5c46722f0b97dfd69470/envoy/secret/secret_provider.h#L63-L64

@soulxu We definitely need a new interface for certificate provider, see protobuf api https://github.com/envoyproxy/envoy/issues/18928#issuecomment-1156415443, certficate provider needs to provide certificates based on one cert name.

SDS can not satisfy my requirement, The lack of on-demand in Istio is one of the gaps. Envoy does not support that as well. Besides, each sds config is corresponding to one secret provider in transport socket which can only fetch one single extensions.transport_sockets.tls.v3.TlsCertificate, if we need multiple certificates we will require the control plane to distribute more sds configs, I don't know how to implement this functionality. How to carry information from dataplane to sds server is another problem, these information is used to mimicking certs, but we have request format limitations when use xDS protocol.

@LuyaoZhong I think you're missing a chunk of work in your proposed solution: You will need a way to delay the TLS handshake until you have the cert. This will probably involve a custom handshaker, which will have the integration points with your other code that fetches/generates the cert.

@ggreenway We can delay the TLS handshake until we have the cert with current proposal. I give more details about how it work and address your comments in bumping doc.

soulxu commented 2 years ago

@soulxu We definitely need a new interface for certificate provider, see protobuf api #18928 (comment), certficate provider needs to provide certificates based on one cert name.

Thanks! Not sure I understand correctly, Is the problem that currently each TlsCertificateConfigProvider only return one secret? Not sure if that possible to change TlsCertificateConfigProvider to enable to return multiple secrets.

Actually, I'm thinking it will be great if the custom certificate provider can return the same tls certicate config (https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/transport_sockets/tls/v3/common.proto#envoy-v3-api-msg-extensions-transport-sockets-tls-v3-tlscertificate)

This is will be consistent with SDS provider and static cert provider, then you can utilize most of other part codes I thought. Like you also can config the private key provider, and the other part of tls transport will help make it works.

Apologize if I still do not understand correctly.

SDS can not satisfy my requirement, The lack of on-demand in Istio is one of the gaps. Envoy does not support that as well. Besides, each sds config is corresponding to one secret provider in transport socket which can only fetch one single extensions.transport_sockets.tls.v3.TlsCertificate, if we need multiple certificates we will require the control plane to distribute more sds configs, I don't know how to implement this functionality. How to carry information from dataplane to sds server is another problem, these information is used to mimicking certs, but we have request format limitations when use xDS protocol.

I got it. Seems your key requirement is mimicking certs on-demand, which lead to you to consider the on-demand SDS. I'm just thinking that the admin or operator can pre-defined the allowed sites to access, then the control plane generates those mimicking certs first before deploying the Envoy.

But yes, I'm not sure that matching your original requirement or not.

be curious, in your use case, would you allow your admin/operator to control which site can be mimick?

LuyaoZhong commented 2 years ago

Actually, I'm thinking it will be great if the custom certificate provider can return the same tls certicate config (https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/transport_sockets/tls/v3/common.proto#envoy-v3-api-msg-extensions-transport-sockets-tls-v3-tlscertificate)

Yes, my plan is to let the cert provider return a tls certificate config list https://github.com/envoyproxy/envoy/pull/19582#discussion_r876788500. This is going to be updated to the cert provider PR.

I got it. Seems your key requirement is mimicking certs on-demand, which lead to you to consider the on-demand SDS. I'm just thinking that the admin or operator can pre-defined the allowed sites to access, then the control plane generates those mimicking certs first before deploying the Envoy.

Control plane can not mimic the certs based on real server cert, so this must be handle in Envoy after connecting upstream.

But yes, I'm not sure that matching your original requirement or not.

be curious, in your use case, would you allow your admin/operator to control which site can be mimick?

Yes, we will allow admin/operator to set a bumping list or a list that we don't want bumping.