canonical / istio-operators

Charmed Istio
2 stars 17 forks source link

`nodeSelectorTerms` gets ignored because of OR logic #426

Closed DnPlas closed 3 months ago

DnPlas commented 4 months ago

Bug Description

istio-gateway 1.16 has a default nodeSelectorTerms for requiredDuringSchedulingIgnoredDuringExecution (see here that causes any custom nodeSelectorTerm injected by the namespace-node-affinity-operator to be "ORed", resulting in an unexpected scheduling as the default could be True before evaluating the injected term. That happens because:

If you specify multiple terms in nodeSelectorTerms associated with nodeAffinity types, then the Pod can be scheduled onto a node if one of the specified terms can be satisfied (terms are ORed). Reference

Please note this issue happens only in 1.16, as greater versions will not have a default option (removed by https://github.com/canonical/istio-operators/commit/88c9b987a0371e88e574574c5917e79ca965418e)

To Reproduce

  1. Deploy istio-operators 1.16/stable
  2. Deploy namespace-node-affinity
  3. Pass a configuration to inject custom nodeSelectorTerms
  4. Depending on the custom configuration you will have to check the Pod is scheduled in the desired Node

Environment

  1. Kubernetes cluster with at least two nodes
  2. istio-operators 1.16/stable
  3. Nodes are labeled so that we can schedule Pods based on their nodeSelectors

Relevant Log Output

N/A

Additional Context

Similar issue

syncronize-issues-to-jira[bot] commented 4 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5760.

This message was autogenerated

DnPlas commented 3 months ago

istio-operators 1.17/stable

Since this is not an issue in istio-operators 1.17/stable, I was able to deploy the istio-ingressgateway workload Pod in a labeled Node after customizing the nodeTermSelector. This is my environment:

Model       Controller  Cloud/Region        Version  SLA          Timestamp
istio-test  uk8s-31     microk8s/localhost  3.1.8    unsupported  16:24:00Z

App                      Version  Status  Scale  Charm                    Channel      Rev  Address         Exposed  Message
istio-ingressgateway              active      1  istio-gateway            1.17/stable  723  10.152.183.168  no       
istio-pilot                       active      1  istio-pilot              1.17/stable  827  10.152.183.249  no       
namespace-node-affinity           active      1  namespace-node-affinity  2.2/stable    22  10.152.183.147  no    
root@microk8s-admin:~# cat settings.yaml 
istio-test: |
  nodeSelectorTerms:
    - matchExpressions:
      - key: workload
        operator: In
        values:
        - istio-deploys-here

Upgrading from 1.16/stable -> 1.17/stable

Upgrading is rather easy, users who don't want to wait for #438 to be merged can just refresh the istio-operators following this guide. Please note that istio-operators 1.17/stable require juju >= 3.1 and Kubernetes 1.25 (at least).

Fix for 1.16/stable

The fix for this version of the charm is to just remove the default values from the Gateway deployment, similar to how it is done in 1.17/stable. Users can try this fix by deploying the charm in #438.

DnPlas commented 3 months ago

Fixed by #438