Open LuyaoZhong opened 3 years ago
/cc @lizan @asraa @ggreenway
Can you please elaborate on the desired traffic flow (client envoy's possition, server, which connections are TLS vs plaintext)?
I am curious what kind of cert is needed for the google/bing access.
If the upstream is google/bing, envoy doesn't terminate tls but initiate tls.
The straw man flow confuses me: is the cert applied in downstream connection or upstream connection?
@ggreenway @lambdai The desired traffic flow is like this: <downstream/internal service> ---- TLS(mimic cert generated by Envoy) ---- < Envoy> ---- TLS ---- <upstream/external service>
I mean envoy needs to terminate downstream TLS first, then we can apply many filters to control internal service accessing external network, and after that envoy initiates TLS to upstream. The mimic cert will be applied to downstream connection. There is no change to upstream connection. I'm not sure if I was using a proper word "terminate", if not please correct me.
Thanks for your comments.
Ok, I think I understand now. Let me paraphrase to make sure I understand: you'd like for envoy to have a CA cert/key, trusted by the downstream client, and for envoy to dynamically generate a TLS cert signed by the CA cert/key for whatever name is in the SNI of a connection?
@ggreenway Yes, exactly. Does it make sense for you?
I wrote some PoC code for dynamically generating cert, and I tested the downstream TLS handshake using the mimic cert.
For API change, envoy currently requires certs(static or sds) set in config yaml file, and the code path doesn't take the case I mentioned into consideration. To support this feature I need a proper API introduced to indicate we will do TLS handshake using dynamic cert . I would like you could help me on this new API definition, I'm thinking about adding "tls_root_certificates" to CommonTlsContext, and it is only valid when the commonTlsContext is part of DownstreamTlsContext:
[extensions.transport_sockets.tls.v3.CommonTlsContext]
{ "tls_params": "{...}", "tls_certificates": [], "tls_root_certificates": [], "tls_certificate_sds_secret_configs": [], "validation_context": "{...}", "validation_context_sds_secret_config": "{...}", "combined_validation_context": "{...}", "alpn_protocols": [], "custom_handshaker": "{...}" }
[extensions.transport_sockets.tls.v3.TlsRootCertificate]
{ "root_ca_cert": "{...}", "root_ca_key": "{...}" }
Do you think it is reasonable?
I think a more general approach would be to implement this as a listener filter. It could either run after tls_inspector
(which reads the SNI value), or re-implement that part. It can then generate the needed cert, and we can add an API for a listener filter to signal to the TLS transport_socket which certificate to use.
There have been other feature requests to support extremely large numbers of fixed/pre-generated certs and to choose the correct one at runtime, and this implementation could support that use case as well.
Does that sound workable to you?
Generating certs in a listener filter sounds workable. But an API for a listener filter might not be enough, the old DownstreamTlsContext
still requires user setting tls certificates, we can't avoid touching DownstreamTlsContext
or its sub apis.
I think we could add a FilterState
from the listener filter which contains the cert/key to use, and have SslSocket
check for it's presence and set the cert on the SSL*
(not SSL_CTX*
).
Yes, it's SSL*
(not SSL_CTX
).
Let me list several questions and answers to make the design clear:
tls_inspector
is not a good place for generating certs, because we don't want dynamically generating certs for all SNI, we want tls_inspector
to detect SNI first, then dispatch to different filterchain according to SNI. This will be more flexible, since we can have different certs conifg policy for different filterchain, static, sds or dynamic.
In my PoC I generate the certs in SslSocket::setTransportSocketCallbacks
[1].DownstreamTlsContext
API.
[2] shows Envoy requiring user setting tls certificates otherwise it exits during bootstrap. I went through some code, a easy way is to introduce an API to indicate it has the capability to provide certificates[3].[2]https://github.com/envoyproxy/envoy/blob/9cc74781d818aaa58b9cca9602fe8dc62181d27b/source/extensions/transport_sockets/tls/context_config_impl.cc#L411 [3]https://github.com/envoyproxy/envoy/blob/9cc74781d818aaa58b9cca9602fe8dc62181d27b/source/extensions/transport_sockets/tls/context_config_impl.cc#L408
DownstreamTlsContext
(2nd question), I prefer it's for per transportsocket but not per listener, what do you think?
- Where to generate certs? After deliberation, I think
tls_inspector
is not a good place for generating certs, because we don't want dynamically generating certs for all SNI, we wanttls_inspector
to detect SNI first, then dispatch to different filterchain according to SNI. This will be more flexible, since we can have different certs conifg policy for different filterchain, static, sds or dynamic.
Yeah this all makes sense, having generating part in transport socket sounds reasonable to me. We might need a cache to store generated cert so they aren't generated for every connection.
Perhaps SDS should be acts as that counted cache. RDS/ECDS/EDS maintains the N:1 mapping (N subscription 1 config) and it's not surprising to introduce to SDS.
@LuyaoZhong My understanding is that your POC is generating CSR, if this functionality can be moved to SDS, some of the SDS server could be leveraged
@lizan @lambdai Thanks for your comments. A cache sounds good. SDS could be one option to cache the dynamic certs, we are supposed to support both local cache and SDS, right? If so, I want to start with local cache and then introduce SDS later. Does it make sense for you?
I investigated the API, related classes and workflow, and completed the first version of code, see https://github.com/envoyproxy/envoy/pull/19137.
In this code version, we have done:
common_tls_context:
tls_root_ca_certificate:
cert: {"filename": "root-ca.pem"}
private_key: {"filename": "root-ca.key"}
I'll split the patch, polish the code, reword the original proposal description after some design details settle down. Could you help review the design items I listed above. What's your suggestion?
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
This is no stale.
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.
@lizan @ggreenway could you help label this as "no statebot" Previous design and PoC is mimicing certificate and doing TLS handshake with downstream before connecting upstream, is it possible to support the following workflow in Envoy?
You could achieve this by writing a listener filter. I think you could put it in the listener filters after the existing tls inspector. At this point the SNI is known, so you could so whatever asynchronous work you need to do, and when it's finished, allow the filter chain to continue.
- downstream requires accessing some external websites like "https://www.google.com"
Reading the below, I think the domain name is captured by envoy via SNI? I want to add that the http host in plain text is also supported.
Envoy receive the CLIENT_HELLO but don't do handshake with downstream until step5
Envoy connects "https://www.google.com"(upstream) and get real server certificate
AFAIK there is not many envoy components you could you under the context of listener filter. But it's definitely possible.
- Envoy copies the subject, subject alt name, extensions, etc from real server certificate and generates mimic certificate
- Envoy does TLS handshake with downstream using mimic certificate
With #19582, right? It seems a TLS transport socket need to generated on demand along with a mimic cert provider. That could be something new, or continue the work on creating an ondemand network filter(which includes a transport socket)
- traffic is decrypted and go through Envoy network filters.
@ggreenway @lambdai Will it cause two connections to upstream for one request? Since we need to connect upstream and get server certificate back first. And after the traffic is decrypted and goes though network filters, we connect upstream again to transfer traffic data. It is possible to reused the first connection?
Besides, #19582 is a extension in current TLS transport socket, if implementing the mimicking inside listener filter it seems I need to implement TLS transport socket inside listener filter otherwise I can not reuse #19582. Does it make sense to integrating a transport socket inside listener filter?
@lambdai could you provide more details about "there is not many envoy components in listener filter". How big is the gap?
re: connect upstream and get server certificate
Can this job achieved as part of the cert provider bootstrap(or another extension)? If so, you only need ref the new component in the listener filter and register a resume path to drive the listener filter on cert fetched
add @liverbirdkte
@lambdai @ggreenway
re:
connect upstream and get server certificate
Can this job achieved as part of the cert provider bootstrap(or another extension)? If so, you only need ref the new component in the listener filter and register a resume path to drive the listener filter on cert fetched
It sounds moving the implementation from listener filter to cert provider, while cert provider has less components than listener filter and it's not ready for now. Which design do you prefer, cert provider + listener filter, or a stand-along listener filter?
We have to connect upstream in downstream subsystem of Envoy. Besides, we need to store that socket to somewhere and use it when transforming data to upstream, Otherwise AFAIK Envoy will try to create a new connection to upstream. Is there any risk to implement this?
@ggreenway @lambdai
HCM is designed as a terminal filter, it brings a lot of limitation in our case. We want this feature could work with HCM. After the traffic is decrypted, we make the data go through HCM, users could plug in many security functions by extending http filters, therefore the traffic to external is monitored.
Current HCM sets up connection with Upstream in http router filter after receiving http headers, if we implement another listener filter or network filter or any other component to get server cert, how to make HCM reuse the connection is the problem we need to address. It seems not easy, do you have any suggestion?
We came up with another idea. What about making HCM implementing both ReadFilter and WriteFilter and worked as a non-terminal filter? We need to decouple the request and response processing. onData
(ReadFilter) corresponds to request path, onWrite
(WriteFilter) corresponds to response path. HCM will not connect upstream, to get server cert, send and receive data from upstream, we need a terminal filter like tcp proxy at the end of network filter chain.
Does it make sense?
@ggreenway @lambdai ping
I don't understand what you're trying to accomplish. Are you trying to make sure that downstream connections re-use the same upstream connection?
Changing HCM to a non-terminal filter does not seem like a viable approach.
Sorry, I don't fully understand your intention. I sincerely think you need a "better"(in term of reuse and consuming RDS) http async client to fetch the cert.
The HCM as a network filter is over kill because you need to feed the data to HCM and drain.
Since I don't know how good the current http async client is, you can always use the current http async client from an internal cluster
.
That internal cluster contains the internal address. Meanwhile you can create an internal listener "listening" on that address, and that listener contains your desired HCM which can consume RDS and use any upstream cluster type.
@ggreenway @lambdai
This feature support needs much more work than we were imaging. To avoid making you confused. I draft a document to provide your context. It's mostly text for now, I could add more diagrams later to help you understand it if you like. Could you leave your comments there?
Why was I proposing a non-terminal HCM? In our case we need to connect upstream after receiving downstream TLS CLIENT HELLO, so we need to set up secure connection with upstream via tcp connection pool. After that, we will receive http packets, HCM will try to set up a new connection with upstream via http connection pool, but we want HCM could reuse the pre-established upstream connection. It seems not easy to implement it. And in our case, the http routes match is not useful, we connects dynamic cluster based SNI but not http url, so the idea of making HCM a non-terminal filter came out, non-terminal HCM only cares about data processing, we can add a terminal filter at the end of filter chain to send/receive data to/from upstream.
I have the same use case too. I was wondering whether an approach using custom_handshaker extension can offer a viable solution for this use case. I have not dug into the details yet and wanted to check.
@ggreenway , this was discussed in https://github.com/envoyproxy/envoy/issues/20708 and was wondering whether using custom_handshaker offers a way to solve this use case.
Also, from what I understood the custom_handshaker implementation has to be compiled into Envoy rather than being injected/loaded into Envoy like a Lua/WASM network filter extension.
@LuyaoZhong Sorry for joining the discussion late, I'm not sure I loaded enough the background here, but I have question for your current solution and thinking of another solution here.
First question, why do we need to reuse the connection here? I thought only the first downstream connection to the specific host need to fetch the upstream cert, then the envoy already has enough info for the follow-up downstream connection to create mimic cert. So it means it is ok for a separate connection just for fetching the cert info when first downstream connection.
The second question is why we need to modify HCM, I suppose that only works for HTTP, which will drop the use-case for TCP.
I'm not sure I'm thinking right, but I did think about using the dynamic forward proxy, since I feel it is very similar with what sni_dynamic_forward_proxy
filter doing here, the different is fetching upstream cert info, not the DNS query.
How about using the sni_dynamic_forward_proxy
network filter and envoy.clusters.dynamic_forward_proxy
cluster here. Something like this:
static_resources:
listeners:
- name: my_listener
address:
socket_address:
protocol: TCP
address: 0.0.0.0
port_value: 13333
listener_filters:
- name: envoy.filters.listener.tls_inspector
filter_chains:
- filters:
- name: envoy.filters.network.sni_dynamic_forward_proxy
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.sni_dynamic_forward_proxy.v3.FilterConfig
port_value: 443
dns_cache_config:
name: dynamic_forward_proxy_cache_config
dns_lookup_family: V4_ONLY
- name: envoy.tcp_proxy
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
stat_prefix: ingress_tcp
cluster: my_cluster
clusters:
- name: my_cluster
lb_policy: CLUSTER_PROVIDED
cluster_type:
name: envoy.clusters.dynamic_forward_proxy
typed_config:
"@type": type.googleapis.com/envoy.extensions.clusters.dynamic_forward_proxy.v3.ClusterConfig
dns_cache_config:
name: dynamic_forward_proxy_cache_config
dns_lookup_family: V4_ONLY
We can add the functionality of fetching upstream cert info into the TCP proxy
filter(or a new network filter using it between sni_dynamic_forward_proxy
and tcp proxy
filter)'s onNewConnection()
callback. Before you get enough info, you can keep return StopIteration
from the TCP proxy
filter, to let the downstream connection hanging there.
The full flow is as below:
onNewConnection()
callback invoked on the network filter (This is invoked before downstream SSL handshake) https://github.com/envoyproxy/envoy/blob/1938af1d5cf650d2baa391124f1a61c939cab56f/source/common/network/filter_manager_impl.cc#L44
https://github.com/envoyproxy/envoy/blob/1938af1d5cf650d2baa391124f1a61c939cab56f/source/common/network/filter_manager_impl.cc#L70sni_dynamic_forward_proxy
filter 's onNewConnection()
callback will query the DNS based on SNI
.dynamic_forward_proxy
cluster will create entry for it.Tcp proxy
filter's onNewConnection()
callback (or we create new network filter), you already can pick the route based on SNI
https://github.com/envoyproxy/envoy/blob/1938af1d5cf650d2baa391124f1a61c939cab56f/source/common/tcp_proxy/tcp_proxy.cc#L574
Using the route create an upstream connection, and fetch the upstream cert info, but keep returning StopIteration
until you get enough info. (This is similar with sni_dynamic_forward_proxy
filter query the DNS and blocking the connection)StreamInfo
), then return Continue
for the Tcp proxy
filter's onNewConnection()
callbackI little prefer to create a new network filter then add that to the TCP proxy
filter, and use it between the sni_dynamic_forward_proxy
filter and tcp proxy
filter.
But yes, this is a high-level workflow, I believe there still have a lot of details.
cc @lambdai @ggreenway @mattklein123
@soulxu Thanks for your comments. We have to make it work with HCM, so we always struggle with how to reuse the upstream connection. Our proposed solution#1 in TLS Bumping in Envoy is similar to your suggestion, the new bumping filter is expected to work like tcp proxy and support getting and caching server cert.
After having discussion with @soulxu and @liverbirdkte on the upstream connection reuse issue, we agree on giving up connection reuse. Because we will support server cert caching, duplicated connections only happens at the first time connecting a new external service, afterwards we use the cache(mimicked cert) in bumping filter to do downstream handshake, and leave the upstream handshake to HCM. Introducing one more connection at the first time connecting upstream is supposed not to cause much performance downgrade. And it also avoids introducing complexity to underlying framework only for addressing the reuse issue.
@ggreenway @lambdai @mattklein123
@soulxu Thanks for your comments. We have to make it work with HCM, so we always struggle with how to reuse the upstream connection. Our proposed solution#1 in TLS Bumping in Envoy is similar to your suggestion, the new bumping filter is expected to work like tcp proxy and support getting and caching server cert.
Thanks for updating solution#1, I prefer solution#1. It needn't modify HCM, and it gonna be works both for HTTP and TCP.
After having discussion with @soulxu and @liverbirdkte on the upstream connection reuse issue, we agree on giving up connection reuse. Because we will support server cert caching, duplicated connections only happens at the first time connecting a new external service, afterwards we use the cache(mimicked cert) in bumping filter to do downstream handshake, and leave the upstream handshake to HCM. Introducing one more connection at the first time connecting upstream is supposed not to cause much performance downgrade. And it also avoids introducing complexity to underlying framework only for addressing the reuse issue.
Yea, reuse connection doesn't provide too much benefit, and complex the whole thing. Without reuse connection, then we can have solution#1 here. It is more simple.
In general solution #1 seems OK to me with a few caveats: 1) Per discussion I would recommend skipping connection reuse in the initial implementation. It will be much harder and I don't think the extra work is justified. 2) Some thought needs to put into how the signing of the fake certs is going to be done. I'm honestly not thrilled with this being built directly into Envoy. It seems better to potentially have a gRPC API that can be used for this and then organizations can do this however they want since how they manage CA certs, etc. is really out of scope for Envoy. 3) We should minimize changes to the existing systems as much as possible. All functionality should be in new optional extensions, besides for any hooks that are needed to call the new functionality.
@mattklein123
In general solution #1 seems OK to me with a few caveats:
- Per discussion I would recommend skipping connection reuse in the initial implementation. It will be much harder and I don't think the extra work is justified.
Thanks for your vote on solution#1. We could dive into design details next step along with this direction. It will be very helpful to simplify our implementation.
- Some thought needs to put into how the signing of the fake certs is going to be done. I'm honestly not thrilled with this being built directly into Envoy. It seems better to potentially have a gRPC API that can be used for this and then organizations can do this however they want since how they manage CA certs, etc. is really out of scope for Envoy.
Yes, we agree on that it's out of scope for Envoy, we are looking for a proper extension mechanism to support cert mimicking, certificate provider was recommended before, after discussion with @derekguo001 based on your comments on #19217, we will investigate SDS as well and figure out which one is better for our case.
- We should minimize changes to the existing systems as much as possible. All functionality should be in new optional extensions, besides for any hooks that are needed to call the new functionality.
Yes, we are trying our best to follow these design principles.
A brief summary based on investigation of SDS and Certificate Provider.
Mimicking certs is essentially getting cert resources during runtime. This requires on-demand update mechanism. Envoy acts as an resource initiator upon data plane request.
SDS On-demand mechanism depends on incremental xDS. Current SDS in Envoy does not support on-demand update though incremental xDS protocol is ready on Envoy-side . Istio-agent(sds server) even does not implement incremental xDS protocol at all.
Another concern about SDS is how to carry parameters to SDS server, e.g. the subject, subject alt name, RSA strength, etc, we needs these information to generating cert. DeltaDiscoveryRequest has limited fields, dynamic_parameters in node might be a proper place but it is marked as WIP.
Certificate Provider This is still working in process, we need to consider on-demand support as well. We had some discussion on certificate provider PR. These allows us implement arbitrary provider instance without protocol limitation.
For initial implementation, we propose to continue utilize Certificate Provider, complete the general framework first, and then we could implement a local cert provider service for ease to verify whole bumping functionality. SDS is our backup since there are still a lot of gaps.
What's your suggestion to this proposal? @mattklein123 @lambdai @markdroth @soulxu
Please correct me if I miss or misunderstand anything. Besides, is there any reference/guidance we can follow to design a on-demand mechanism.
@mattklein123 @lambdai @markdroth @soulxu ping
What's your suggestion to this proposal? @mattklein123 @lambdai @markdroth @soulxu
I don't have an opinion given the data. I'm not exactly sure what "certificate provider" means in this context and what the API would look like. SDS in general still seems like a better option to me if we can sort out the missing pieces. The lack of on-demand in Istio should not factor into how it's implemented in Envoy.
Continuing discussion from #1984:
I believe that a custom Handshaker can provide the server certificate if it's HandshakerFactory
sets provides_certificates = true
in capabilities()
, and then sets SSL_CTX_set_select_certificate_cb()
in the handshaker where the selection logic can happen. A custom handshaker is configured via config.core.v3.TypedExtensionConfig custom_handshaker = 13;
in the CommonTlsContext
.
I don't have an opinion given the data. I'm not exactly sure what "certificate provider" means in this context and what the API would look like. SDS in general still seems like a better option to me if we can sort out the missing pieces. The lack of on-demand in Istio should not factor into how it's implemented in Envoy.
@mattklein123
"certificate provider" API is ready, but not implemented yet. I have a WIP PR for it, the updates stop for some time since I focus on whole bumping design recently.
// Indicates a certificate to be obtained from a named CertificateProvider plugin instance.
// The plugin instances are defined in the client's bootstrap file.
// The plugin allows certificates to be fetched/refreshed over the network asynchronously with
// respect to the TLS handshake.
// [#not-implemented-hide:]
message CertificateProviderPluginInstance {
// Provider instance name. If not present, defaults to "default".
//
// Instance names should generally be defined not in terms of the underlying provider
// implementation (e.g., "file_watcher") but rather in terms of the function of the
// certificates (e.g., "foo_deployment_identity").
string instance_name = 1;
// Opaque name used to specify certificate instances or types. For example, "ROOTCA" to specify
// a root-certificate (validation context) or "example.com" to specify a certificate for a
// particular domain. Not all provider instances will actually use this field, so the value
// defaults to the empty string.
string certificate_name = 2;
}
message CommonTlsContext {
......
// Only one of *tls_certificates*, *tls_certificate_sds_secret_configs*,
// and *tls_certificate_provider_instance* may be used.
repeated TlsCertificate tls_certificates = 2;
repeated SdsSecretConfig tls_certificate_sds_secret_configs = 6
[(validate.rules).repeated = {max_items: 2}];
// [#not-implemented-hide:]
CertificateProviderPluginInstance tls_certificate_provider_instance = 14;
......
}
The lack of on-demand in Istio is one of the gaps. Envoy does not support that as well. Besides, each sds config is corresponding to one secret provider in transport socket which can only fetch one single extensions.transport_sockets.tls.v3.TlsCertificate, if we need multiple certificates we will require the control plane to distribute more sds configs, I don't know how to implement this functionality. How to carry information from dataplane to sds server is another problem, these information is used to mimicking certs, but we have request format limitations when use xDS protocol. If we use certificate provider, we can implement an instance providing multiple certificates, and we can implement bumping by registering a resume path to cert provider on cert fetched.
I believe that a custom Handshaker can provide the server certificate if it's HandshakerFactory sets provides_certificates = true in capabilities(), and then sets SSL_CTX_set_select_certificate_cb() in the handshaker where the selection logic can happen. A custom handshaker is configured via config.core.v3.TypedExtensionConfig custom_handshaker = 13; in the CommonTlsContext.
@ggreenway
As I mentioned above, certificate provider should be enough and easy. If we use custom handshaker we still need something like certificate provider inside handshaker. Besides, we want updating the global context config when receiving the certs, not only in handshaker context,otherwise we need mimicking certs every time receive a request. Do we have to set SSL_CTX_set_select_certificate_cb() by customer handshaker? Can we just modify current selection function? My understanding is it's a general feature, consider if we attach several static tls certificates for different SNI in transport socket, or we have several sds configs that fetching multiple certs, we can do SNI-based selection anyway, and fallback to some certificate if not match.
I was just giving an example of how to accomplish the task you were asking about in the comments of a very old PR.
I agree that it could make sense to have multiple certs in a single TLS context and have more logic to select them. But I think that's a separate feature from what this issue is tracking. Feel free to open an issue for it to discuss.
Using the certificate provider API for this to make a new extension seems fine. Beyond that you will need to work out the details which as @ggreenway says may involve a separate set of work items to make it easier to allow a handshake extension to select from multiple certs, through a provider, etc. I would recommend creating a document and outlining a very specific set of work items that we can agree on that will accomplish your task.
@mattklein123 @ggreenway
Thanks for your comments. I add a "Proposed Changes" section in bumping doc.
As for cert selection, let's go to https://github.com/envoyproxy/envoy/issues/21739 to discuss details.
I think both SDS (I don't think on-demand SDS should be an issue) and certificate provider work for you. As my understanding, it is always fine with an extension for the non-core use-case, also certificate provider seems already have other usecase ( https://github.com/envoyproxy/envoy/issues/21292). then it becomes reasonable extension point.
The certificate provider you defined https://github.com/envoyproxy/envoy/pull/19582/files#diff-57c305aa5cc3e7196c5c808a13ff7819ab9dd089cabffda303d885dfde43ce13R19 seems strange for me, or maybe I didn't understand that correclty.
I think you needn't define a new custom certificate provider interface. The custom certificate provider should implement the existing interface https://github.com/envoyproxy/envoy/blob/8259b33fea720672835d5c46722f0b97dfd69470/envoy/secret/secret_provider.h#L63-L64
@LuyaoZhong I think you're missing a chunk of work in your proposed solution: You will need a way to delay the TLS handshake until you have the cert. This will probably involve a custom handshaker, which will have the integration points with your other code that fetches/generates the cert.
I think both SDS (I don't think on-demand SDS should be an issue) and certificate provider work for you. As my understanding, it is always fine with an extension for the non-core use-case, also certificate provider seems already have other usecase ( #21292). then it becomes reasonable extension point.
The certificate provider you defined https://github.com/envoyproxy/envoy/pull/19582/files#diff-57c305aa5cc3e7196c5c808a13ff7819ab9dd089cabffda303d885dfde43ce13R19 seems strange for me, or maybe I didn't understand that correclty.
I think you needn't define a new custom certificate provider interface. The custom certificate provider should implement the existing interface
@soulxu We definitely need a new interface for certificate provider, see protobuf api https://github.com/envoyproxy/envoy/issues/18928#issuecomment-1156415443, certficate provider needs to provide certificates based on one cert name.
SDS can not satisfy my requirement, The lack of on-demand in Istio is one of the gaps. Envoy does not support that as well. Besides, each sds config is corresponding to one secret provider in transport socket which can only fetch one single extensions.transport_sockets.tls.v3.TlsCertificate, if we need multiple certificates we will require the control plane to distribute more sds configs, I don't know how to implement this functionality. How to carry information from dataplane to sds server is another problem, these information is used to mimicking certs, but we have request format limitations when use xDS protocol.
@LuyaoZhong I think you're missing a chunk of work in your proposed solution: You will need a way to delay the TLS handshake until you have the cert. This will probably involve a custom handshaker, which will have the integration points with your other code that fetches/generates the cert.
@ggreenway We can delay the TLS handshake until we have the cert with current proposal. I give more details about how it work and address your comments in bumping doc.
@soulxu We definitely need a new interface for certificate provider, see protobuf api #18928 (comment), certficate provider needs to provide certificates based on one cert name.
Thanks! Not sure I understand correctly, Is the problem that currently each TlsCertificateConfigProvider only return one secret? Not sure if that possible to change TlsCertificateConfigProvider
to enable to return multiple secrets.
Actually, I'm thinking it will be great if the custom certificate provider can return the same tls certicate config (https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/transport_sockets/tls/v3/common.proto#envoy-v3-api-msg-extensions-transport-sockets-tls-v3-tlscertificate)
This is will be consistent with SDS provider and static cert provider, then you can utilize most of other part codes I thought. Like you also can config the private key provider, and the other part of tls transport will help make it works.
Apologize if I still do not understand correctly.
SDS can not satisfy my requirement, The lack of on-demand in Istio is one of the gaps. Envoy does not support that as well. Besides, each sds config is corresponding to one secret provider in transport socket which can only fetch one single extensions.transport_sockets.tls.v3.TlsCertificate, if we need multiple certificates we will require the control plane to distribute more sds configs, I don't know how to implement this functionality. How to carry information from dataplane to sds server is another problem, these information is used to mimicking certs, but we have request format limitations when use xDS protocol.
I got it. Seems your key requirement is mimicking certs on-demand, which lead to you to consider the on-demand SDS. I'm just thinking that the admin or operator can pre-defined the allowed sites to access, then the control plane generates those mimicking certs first before deploying the Envoy.
But yes, I'm not sure that matching your original requirement or not.
be curious, in your use case, would you allow your admin/operator to control which site can be mimick?
Actually, I'm thinking it will be great if the custom certificate provider can return the same tls certicate config (https://www.envoyproxy.io/docs/envoy/latest/api-v3/extensions/transport_sockets/tls/v3/common.proto#envoy-v3-api-msg-extensions-transport-sockets-tls-v3-tlscertificate)
Yes, my plan is to let the cert provider return a tls certificate config list https://github.com/envoyproxy/envoy/pull/19582#discussion_r876788500. This is going to be updated to the cert provider PR.
I got it. Seems your key requirement is mimicking certs on-demand, which lead to you to consider the on-demand SDS. I'm just thinking that the admin or operator can pre-defined the allowed sites to access, then the control plane generates those mimicking certs first before deploying the Envoy.
Control plane can not mimic the certs based on real server cert, so this must be handle in Envoy after connecting upstream.
But yes, I'm not sure that matching your original requirement or not.
be curious, in your use case, would you allow your admin/operator to control which site can be mimick?
Yes, we will allow admin/operator to set a bumping list or a list that we don't want bumping.
TLS Bumping in Envoy Design Doc
2022.10.31
PoC: https://github.com/envoyproxy/envoy/pull/23192 README and configurations are in tls_bumping subdir
2022.07.13 4 work items were worked out.
Certificate Provider instance - LocalMimicCertProvider https://github.com/envoyproxy/envoy/pull/23063
2022.04.24 update
Mimicking certs only based on SNI is probably not enough, we require server real certificate and ensure to copy subject, subject alt name, extensions, knowing about the RSA key strength and many more. Original proposal was to set up client-first secure connection, to meet above requirements we need server-first secure connection.
Therefore, we expect the workflow like this:
Original Proposal
Title: decrypting communications between internal and external services
Description:
Changes (straw man)
Any comments are welcome.