envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
24.75k stars 4.76k forks source link

Question: Filter chain not matching on expected criteria #7341

Open bcelenza opened 5 years ago

bcelenza commented 5 years ago

Bug Template

Title: Expected filter chain match is not made

Description:

Consider the following listener configuration:

      address:
        socket_address:
          address: 0.0.0.0
          port_value: 15001
      filter_chains:
      - filter_chain_match:
          destination_port: 443
          server_names:
          - "*.foo.com"
        filters:
        - name: envoy.tcp_proxy
          config:
            stat_prefix: egress
            cluster: foo
      - filter_chain_match:
          prefix_ranges:
          - address_prefix: 0.0.0.0
            prefix_len: 0
        filters:
        - name: envoy.tcp_proxy
          config:
            stat_prefix: egress
            cluster: bar
      listener_filters:
      - name: envoy.listener.original_dst
      - name: envoy.listener.tls_inspector

The clusters in this example are both simple original destination clusters (and could conceivably be the same cluster).

If I send an HTTP request to https://somewhere.foo.com:443, the request succeeds as expected. However if I send an HTTP request to https://bar.com:443, Envoy rejects the request with:

[source/server/connection_handler_impl.cc:231] closing connection: no matching filter chain found

The filter chain match documentation states:

In order for a filter chain to be selected, ALL of its criteria must be fulfilled by the incoming connection, properties of which are set by the networking stack and/or listener filters.

My understanding is the second example request should match on the filter chain match for all destination addresses (0.0.0.0/0), but this does not occur, and Envoy claims no match is made. However, if I remove the first filter chain match (443 for *.foo.com), the match is made and the request processed as expected.

I have verified this behavior in Envoy 1.9.0 and 1.9.1.

Is this expected behavior for Envoy?

mattklein123 commented 5 years ago

I would need to go look at the code in more detail. @PiotrSikora @silentdai do you have a quick assessment of whether this is a bug or WAI?

PiotrSikora commented 5 years ago

This is working as intended - the decision tree first matches on :443, which eliminates all filter chains without :443 match specified from further considerations.

If you add destination_port: 443 to the second filter chain, then you'll get the behavior you're expecting.

bcelenza commented 5 years ago

Thanks for the clarification.

Is there any reason why the decision tree cannot back out of the initial match on port if the remaining criteria does not match? Or perhaps there's a reason for this approach that I'm just not seeing?

As written, the documentation quoted above is at least a bit confusing to me in that, in this case, it's not matching ALL criteria in the filter chain match, even though the next filer chain match does match all criteria, and as a result never reaching a decision when there's an obvious match at the bottom of the chain.

If you add destination_port: 443 to the second filter chain, then you'll get the behavior you're expecting.

Is this true? If I add destination_port: 443 to the second filter chain, that filter chain would always be matched, because destination IP is higher in priority for matching than SNI.

mattklein123 commented 5 years ago

Is there any reason why the decision tree cannot back out of the initial match on port if the remaining criteria does not match? Or perhaps there's a reason for this approach that I'm just not seeing?

The implementation today is a giant set of nested hash tables. I think it would be possible to implement backup and restart but it would need some thinking.

As written, the documentation quoted above is at least a bit confusing to me in that, in this case, it's not matching ALL criteria in the filter chain match, even though the next filer chain match does match all criteria, and as a result never reaching a decision when there's an obvious match at the bottom of the chain.

It would be great to get a PR in to clarify the documentation, at least until we decide if we want to allow alternate behavior.

Is this true? If I add destination_port: 443 to the second filter chain, that filter chain would always be matched, because destination IP is higher in priority for matching than SNI.

I think that's right. @PiotrSikora is there a way to express the config intent here with the current implementation?

lambdai commented 5 years ago

The decision tree mode has a strong advantage:

  1. 10000 simple filter chain build per second at main thread, and
  2. amortize 250ns spend at worker thread during the data plane request. Could be worse because my benchmark working set is small and cases are the easiest.

Extending the backup search may not increase the nested hash tables much but the upper limit of finding a filter chain could explode.

  1. It's easy to draft a config that all the filter chains could match that request. That means decision tree lookup is obviously worse than first match.
  2. Readability. Once backup search is implemented, it is almost impossible to figure out the best match by naked eyes. Try hide the 2 chains in 10 other chains and try backup search.
  3. Depends on how to achieve backup search. If we are trying to add more entries to the decision tree, the conflicts at tree leaves could be heavier.
lambdai commented 5 years ago

@bcelenza How about this config:

      - filter_chain_match:
          destination_port: 443
+          prefix_ranges:
+          - address_prefix: 0.0.0.0
+            prefix_len: 0
          server_names:
          - "*.foo.com"
        filters: foo, details omitted
+     - filter_chain_match:
+          destination_port: 443
+          prefix_ranges:
+          - address_prefix: 0.0.0.0
+            prefix_len: 0        
+       filters: bar, detail omitted
      - filter_chain_match:
          prefix_ranges:
          - address_prefix: 0.0.0.0
            prefix_len: 0
        filters: bar, details omitted
mattklein123 commented 5 years ago

My personal feeling on this is that we shouldn't implement backup, but if we really need to we should make the precedence order configurable which is already a TODO in the code.

bcelenza commented 5 years ago

I agree that implementing backup probably isn't the best thing, as we would need to essentially specify it as an alternate matching approach for backwards compatibility.

@silentdai Your approach does help. Thanks.

From my perspective this issue can be closed. Thanks to you all for the help and clarification.

PiotrSikora commented 5 years ago

Is this true? If I add destination_port: 443 to the second filter chain, that filter chain would always be matched, because destination IP is higher in priority for matching than SNI.

Yes and no.

Matching on 0.0.0.0/0 is the default behavior, and it's implied when no prefix_ranges is configured, which means that it shouldn't matter whether it's specified in your configuration or not, since both filter chains should be matching on 0.0.0.0/0.

Having said that, it seems that we have a bug, and that's not happening right now.

I believe it's broken because we create subtrees under both: EMPTY_STRING (default case) and 0.0.0.0/0 (when configured), and only convert EMPTY_STRING to 0.0.0.0/0 when creating a trie, but it's too late in the process, and one of the subtrees is getting lost.

@silentdai could you fix this? I'm happy to help if you run into issues.

It would be great to get a PR in to clarify the documentation, at least until we decide if we want to allow alternate behavior.

@silentdai would you mind taking a stab at this as well? Thanks!

lambdai commented 5 years ago

I believe it's broken because we create subtrees under both: EMPTY_STRING (default case) and 0.0.0.0/0 (when configured), and only convert EMPTY_STRING to 0.0.0.0/0 when creating a trie, but it's too late in the process, and one of the subtrees is getting lost.

Sounds the correct cause. I will fix once #7246 is merged.

would you mind taking a stab at this as well? Thanks!

Absolutely. Will do

lambdai commented 5 years ago

@PiotrSikora could you assign this issue to me for me to track?