cilium / cilium

eBPF-based Networking, Security, and Observability
https://cilium.io
Apache License 2.0
19.91k stars 2.92k forks source link

new feature enforce_policy_on_l7lb is too limited #32755

Open christiancadieux opened 4 months ago

christiancadieux commented 4 months ago

Is there an existing issue for this?

What happened?

CCNP using fromCIDRSet and fromEntities does not work correctly on Ingress traffic to gateway

with this ccnp , the fromCIDRSet value is active and the traffic is blocked when the cidr is incorrect, as explained in the isovalent blog https://isovalent.com/blog/post/cilium-1-15/#h-ingress-network-policy

kind: CiliumClusterwideNetworkPolicy
metadata:
  name: policy-gateway
spec:
  endpointSelector: {}
    matchLabels:
      app: details
  ingress:
  - fromCIDRSet:
    - cidr: 20.112.189.0/24
  - fromEntities:
    - ingress

but as soon as an endpoint selector is added, the CIDR is ignored and the traffic is always allowed.

Or maybe I misunderstood this new feature enforce_policy_on_l7lb and it only works for "endpointSelector: {}" . But if that is true, they this new feature would not be very useful since it would block specific IP from ingress to the whole cluster, and these clusters are typically multi-tenant and each tenant need different CIDRs allowed.

endpointSelector:
    matchLabels:
      app: details
ingress:
  - fromCIDRSet:
    - cidr: 20.112.189.0/24
  - fromEntities:
    - ingress

for this test - there is a default-deny CNP at the namespace level:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  namespace: my-namespace
  name: default-allow-dns
spec:
  egress:
  - toEndpoints:
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: kube-system
  endpointSelector: {}
  ingress:
  - fromEndpoints:
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: kube-system

when the CCNP endpointSelector is changed from app=details to app=details2, the traffic stops (because of th default-deny) thus confirming that the CCNP is active. Also tried without the default-deny CNP - still behaves incorrectly.

but with endpointSelector app=details, the traffic is back, even when the ingress.fromCidrSet.cidr is invalid. I also verify that adding the specific namespace in the CCNP does not help:

...
endpointSelector:
    matchLabels:
      app: details
      k8s:io.kubernetes.pod.namespace: my-namespace

Cilium Version

cilium image (default): v1.14.6 cilium image (stable): v1.15.5

Kernel Version

Linux caas-bglab-comp009--10-112-182-135 5.15.119-flatcar #1 SMP Fri Jul 14 17:48:03 -00 2023 x86_64

Kubernetes Version

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:38:05Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.15", GitCommit:"da6089da4974a0a180c226c9353e1921fa3c248a", GitTreeState:"clean", BuildDate:"2023-10-18T13:29:23Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}

Regression

No response

Sysdump

No response

Relevant log output

No response

Anything else?

cilium config

arping-refresh-period                             30s
auto-direct-node-routes                           false
bgp-secrets-namespace                             kube-system
bpf-lb-acceleration                               disabled
bpf-lb-dsr-dispatch                               geneve
bpf-lb-external-clusterip                         false
bpf-lb-map-max                                    65536
bpf-lb-mode                                       dsr
bpf-lb-sock                                       false
bpf-map-dynamic-size-ratio                        0.0025
bpf-policy-map-max                                16384
bpf-root                                          /sys/fs/bpf
cgroup-root                                       /run/cilium/cgroupv2
cilium-endpoint-gc-interval                       5m0s
cluster-id                                        2
cluster-name                                      bglab-rdei-cni-test-02
cluster-pool-ipv4-cidr                            198.19.0.0/16
cluster-pool-ipv4-mask-size                       24
cluster-pool-ipv6-cidr                            2001:558:104c:10a::/96
cluster-pool-ipv6-mask-size                       112
cni-exclusive                                     true
cni-log-file                                      /var/run/cilium/cilium-cni.log
controller-group-metrics                          write-cni-file sync-host-ips sync-lb-maps-with-k8s-services
custom-cni-conf                                   false
debug                                             false
dnsproxy-enable-transparent-mode                  true
egress-gateway-reconciliation-trigger-interval    1s
enable-auto-protect-node-port-range               true
enable-bgp-control-plane                          true
enable-bpf-clock-probe                            false
enable-bpf-masquerade                             true
enable-endpoint-health-checking                   true
enable-envoy-config                               true
enable-gateway-api                                true
enable-gateway-api-secrets-sync                   true
enable-health-check-loadbalancer-ip               false
enable-health-check-nodeport                      true
enable-health-checking                            true
enable-hubble                                     true
enable-hubble-open-metrics                        true
enable-ipv4                                       true
enable-ipv4-big-tcp                               false
enable-ipv4-masquerade                            true
enable-ipv6                                       true
enable-ipv6-big-tcp                               false
enable-ipv6-masquerade                            false
enable-k8s-networkpolicy                          true
enable-k8s-terminating-endpoint                   true
enable-l2-neigh-discovery                         true
enable-l7-proxy                                   true
enable-local-redirect-policy                      false
enable-masquerade-to-route-source                 false
enable-metrics                                    true
enable-policy                                     default
enable-remote-node-identity                       true
enable-sctp                                       false
enable-svc-source-range-check                     true
enable-vtep                                       false
enable-well-known-identities                      false
enable-xt-socket-fallback                         true
enforce_policy_on_l7lb                            true
etcd-config                                       ---
endpoints:
  - https://etcd-n1:2379
  - https://etcd-n2:2379
  - https://etcd-n3:2379
trusted-ca-file: '/var/lib/etcd-secrets/etcd-client-ca.crt'
key-file: '/var/lib/etcd-secrets/etcd-client.key'
cert-file: '/var/lib/etcd-secrets/etcd-client.crt'
external-envoy-proxy                           false
gateway-api-secrets-namespace                  cilium-secrets
hubble-disable-tls                             false
hubble-export-file-max-backups                 5
hubble-export-file-max-size-mb                 10
hubble-listen-address                          :4244
hubble-metrics                                 dns drop tcp flow port-distribution icmp httpV2:exemplars=true;labelsContext=source_ip,source_namespace,source_workload,destination_ip,destination_namespace,destination_workload,traffic_direction
hubble-metrics-server                          :9965
hubble-socket-path                             /var/run/cilium/hubble.sock
hubble-tls-cert-file                           /var/lib/cilium/tls/hubble/server.crt
hubble-tls-client-ca-files                     /var/lib/cilium/tls/hubble/client-ca.crt
hubble-tls-key-file                            /var/lib/cilium/tls/hubble/server.key
identity-allocation-mode                       kvstore
identity-gc-interval                           15m0s
identity-heartbeat-timeout                     30m0s
install-no-conntrack-iptables-rules            false
ipam                                           cluster-pool
ipam-cilium-node-update-rate                   15s
k8s-client-burst                               10
k8s-client-qps                                 5
kube-proxy-replacement                         true
kube-proxy-replacement-healthz-bind-address    
kvstore                                        etcd
kvstore-opt                                    {"etcd.config": "/var/lib/etcd-config/etcd.config"}
max-connected-clusters                         255
mesh-auth-enabled                              true
mesh-auth-gc-interval                          5m0s
mesh-auth-queue-size                           1024
mesh-auth-rotated-identities-queue-size        1024
monitor-aggregation                            medium
monitor-aggregation-flags                      all
monitor-aggregation-interval                   5s
node-port-bind-protection                      true
nodes-gc-interval                              5m0s
operator-api-serve-addr                        127.0.0.1:9234
operator-prometheus-serve-addr                 :9963
preallocate-bpf-maps                           false
procfs                                         /host/proc
prometheus-serve-addr                          :9962
proxy-connect-timeout                          2
proxy-max-connection-duration-seconds          0
proxy-max-requests-per-connection              0
proxy-prometheus-port                          9964
remove-cilium-node-taints                      true
routing-mode                                   tunnel
service-no-backend-response                    reject
set-cilium-is-up-condition                     true
set-cilium-node-taints                         true
sidecar-istio-proxy-image                      cilium/istio_proxy
skip-cnp-status-startup-clean                  false
synchronize-k8s-nodes                          true
tofqdns-dns-reject-response-code               refused
tofqdns-enable-dns-compression                 true
tofqdns-endpoint-max-ip-per-hostname           50
tofqdns-idle-connection-grace-period           0s
tofqdns-max-deferred-connection-deletes        10000
tofqdns-proxy-response-max-delay               100ms
tunnel-protocol                                geneve

Cilium Users Document

Code of Conduct

joestringer commented 4 months ago

Hi @christiancadieux , thanks for the report. Could you restate the problem in simpler terms, for instance "When I configure a policy exactly like the below, I expect _ but when I do curl ..., I observe ___. "?

christiancadieux commented 4 months ago

when I use the CCNP from https://isovalent.com/blog/post/cilium-1-15/#h-ingress-network-policy

apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: ingress-ccnp
spec:
  endpointSelector: {}
  ingress:
    - fromCIDRSet:
        - cidr: 172.18.0.10/32
    - fromEntities:
        - cluster

it works fine and does block ingress except for the specfied CIDR ( 172.18.0.10/32 in this example) - like the article explains.

but if I enter specific labels in the endpointSelector (to limit ingress to specific namespaces/pods), then it does not work anymore and any sourceIP is allowed access to the pods selected by the endpointSelector.

networkop commented 3 months ago

This is by design, as the policies are enforced in Ingress, before the backends are selected. The empty endpointSelector would select all endpoints but it's also possible to make it only apply to ingress by using

  endpointSelector:
    matchExpressions:
    - key: reserved:ingress
      operator: Exists
christiancadieux commented 3 months ago

I am not following this. the goal of this new option enforce_policy_on_l7lb is to restrict ingress access from specific sourceIP. Can you give a complete CNP or CCNP example that would allow a specific pod in a specific namespace to receive ingress traffic from a specific external sourceIP and block all other sourceIP. I tried this for example (with an invalid cidr on 'outside-to-ingress'), but it does not block anything:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: default-deny-expect-kube-system
spec:
  egress:
  - toEndpoints:
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: kube-system
  endpointSelector: {}
  ingress:
  - fromEndpoints:
    - matchLabels:
        k8s:io.kubernetes.pod.namespace: kube-system
---
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: outside-to-envoy
spec:
  endpointSelector:
    matchExpressions:
    - key: reserved:ingress
      operator: Exists
  ingress:
    - fromCIDRSet:
      - cidr: 50.112.189.0/24

---
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: envoy-to-pods
spec:
  endpointSelector: {}
  ingress:
    - fromEntities:
      - ingress
networkop commented 3 months ago

This feature does not work per Ingress or per namespace, it only works with CCNP and applies globally to all Ingress/GwAPI resources.

christiancadieux commented 3 months ago

right - so I don't understand to value of this feature. cilium is designed to support large multi-tenant clusters. each tenant has it's own security requirements. blocking cidrs for the whole cluster is not useful. that's the bug that I am describing. this feature is not consistent with Cilium NP in general.

networkop commented 3 months ago

that's the current scope of the feature. I agree, it's not as useful as it would've been with per-namespace or per-ingress support. This may still happen but most likely it'll be a separate feature. The current behaviour is a result of internal implementation detail: each Ingress reserves an IP on every node and this IP has 1-1 mapping to ingress identity.

christiancadieux commented 2 months ago

I should not have closed this ticket.

I understand that "The feature does not work per Ingress or per Namespace", and the feature being referenced is "enforce_policy_on_l7lb" , but this bug is about the general problem that NetworkPolicies can only be written againt 'ingress' in general, and not against specific deployments and namespaces.

nilsherzig commented 1 month ago

Just to make sure I understand this correctly, there currently is no way to allow external traffic to ingress_a from CIDR_a and to ingress_b from CIDR_b while blocking everything else?

christiancadieux commented 1 month ago

I may not be following your ingress_a/ingress_b example, but yes, the feature only allows to limit ingress access to the whole cluster. it does not allow to be more specific than that. like networkop explained https://github.com/cilium/cilium/issues/32755#issuecomment-2147602124