aquasecurity / trivy-operator

Kubernetes-native security toolkit
https://aquasecurity.github.io/trivy-operator/latest
Apache License 2.0
1.23k stars 202 forks source link

Nodecollector not running on specific nodes #1987

Closed KevinDW-Fluxys closed 6 months ago

KevinDW-Fluxys commented 6 months ago

What steps did you take and what happened: I have installed trivy-operator with the helm chart, and have configured the node collector as follows:

nodeCollector:
  # -- useNodeSelector determine if to use nodeSelector (by auto detecting node name) with node-collector scan job
  useNodeSelector: true
  # -- registry of the node-collector image
  registry: ghcr.io
  # -- repository of the node-collector image
  repository: aquasecurity/node-collector
  # -- tag version of the node-collector image
  tag: 0.1.2
  # -- imagePullSecret is the secret name to be used when pulling node-collector image from private registries example : reg-secret
  # It is the user responsibility to create the secret for the private registry in `trivy-operator` namespace
  imagePullSecret: ~
  # -- excludeNodes comma-separated node labels that the node-collector job should exclude from scanning (example kubernetes.io/arch=arm64,team=dev)
  excludeNodes: "agentpool=connect"

When i do this, the nodecollector job does not appear. If i remove the excludeNodes="agentpool=connect" he tries to schedule on those nodes (which i dont want) The trivy-operator pods and scanvulnerabilities jobs are running on the correct nodes.

My suspicion is that this is caused by the fact that these nodes have taints, and the nodecollector does not have a tolerations parameter and does not take the tolerations parameter of the trivy-operator into account.

What did you expect to happen: Node collector jobs are scheduled on the correct nodes.

Anything else you would like to add:

Environment:

chen-keinan commented 6 months ago

@KevinDW-Fluxys you can set toleration from scan job with in this param

KevinDW-Fluxys commented 6 months ago

@chen-keinan Unfortunately I already have this parameter set, thats also why my operator and the vulnerability scan jobs are running on the correct node. It seems like the node collector does not take this parameter into account. (or something else is wrong)

chen-keinan commented 6 months ago

@KevinDW-Fluxys can you please describe or fetchnode-collector job manifest to confirm toleration has been set correctly ? also can you please share logs or node-collector pod status ?

KevinDW-Fluxys commented 6 months ago

@KevinDW-Fluxys can you please describe or fetchnode-collector job manifest to confirm toleration has been set correctly ? also can you please share logs or node-collector pod status ?

There is no job being spawned. When i restart the trivy-operator pod it fires up some scan-vulnerabilityreport jobs but no node-collector. If i remove the excludeNodes parameter i do get a job which you can see below. The tolerations are there as expected, so that will probably not be the problem as i suspected first.

apiVersion: batch/v1
kind: Job
metadata:
  name: node-collector-647fddb8f4
  namespace: trivy-operator
  labels:
    app.kubernetes.io/managed-by: trivy-operator
    node-info.collector: Trivy
    trivy-operator.resource.kind: Node
    trivy-operator.resource.name: aks-connect-34676365-vmss000000
spec:
  parallelism: 1
  completions: 1
  activeDeadlineSeconds: 300
  backoffLimit: 0
  selector:
    matchLabels:
      batch.kubernetes.io/controller-uid: bfaadc70-637c-4b1e-913b-325ab8bf825d
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: node-collector
        batch.kubernetes.io/controller-uid: bfaadc70-637c-4b1e-913b-325ab8bf825d
        batch.kubernetes.io/job-name: node-collector-647fddb8f4
        controller-uid: bfaadc70-637c-4b1e-913b-325ab8bf825d
        job-name: node-collector-647fddb8f4
    spec:
      volumes:
        ...
      containers:
        - name: node-collector
          ...
      nodeSelector:
        kubernetes.io/hostname: aks-connect-34676365-vmss000000
      ...
      schedulerName: default-scheduler
      tolerations:
        - key: agentpool
          operator: Equal
          value: default
          effect: NoExecute
  completionMode: NonIndexed
  suspend: false
chen-keinan commented 6 months ago

@KevinDW-Fluxys you mention that you set the param excludeNodes , can you explain why you wanted to use it? you also mention that once you remove the excludeNodes the job do appear, can you tell what is the status of the pod of node-collector?

KevinDW-Fluxys commented 6 months ago

@chen-keinan we have a situation where we have 2 nodepools (default & connect), and we only want to schedule on 1 of those nodepools (default). The nodepool we want to schedule on has a Taint, and the other one doesnt. They also both have labels with the name of the nodepool. Thats why i want to exclude the nodepool by passing the label via excludeNodes and i want it to ignore the taint via the toleration.

The job is on status pending with following events:

Events:
  Type     Reason             Age    From                Message
  ----     ------             ----   ----                -------
  Warning  FailedScheduling   4m32s  default-scheduler   0/11 nodes are available: 2 node(s) had untolerated taint {agentpoo
l: connect}, 9 node(s) didn't match Pod's node affinity/selector. preemption: 0/11 nodes are available: 11 Preemption is not
 helpful for scheduling..
  Normal   NotTriggerScaleUp  4m31s  cluster-autoscaler  pod didn't trigger scale-up: 2 node(s) didn't match Pod's node affi
nity/selector, 1 node(s) had untolerated taint {agentpool: connect}
chen-keinan commented 6 months ago

@KevinDW-Fluxys node-collector use nodeSelector as it need to run on every node maybe you want to set scanJobAffinity if it can run in conjunction with tolerations

teimyBr commented 6 months ago

nodeCollector does not support Toleration. Or also scanJobTolerations or set on the Node Collector ?

KevinDW-Fluxys commented 6 months ago

@chen-keinan I have added the following affinity but now i have no node-collector jobs again.

scanJobAffinity: 
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: agentpool
            operator: In
            values:
            - default

I also noticed i had filled out the following parameter which might be relevant:

scanJobNodeSelector: 
    agentpool: default

Maybe its good to give a summary of all parameters which are related:

operator:
  # -- scanNodeCollectorLimit the maximum number of node collector jobs create by the operator
  scanNodeCollectorLimit: 1

trivyOperator:
  # -- scanJobAffinity affinity to be applied to the scanner pods and node-collector
  scanJobAffinity: 
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: agentpool
            operator: In
            values:
            - default
  # -- scanJobTolerations tolerations to be applied to the scanner pods and node-collector so that they can run on nodes with matching taints
  scanJobTolerations: 
  - key: "agentpool"
    operator: "Equal"
    value: "default" 
    effect: NoExecute
  # -- If you do want to specify tolerations, uncomment the following lines, adjust them as necessary, and remove the
  # square brackets after 'scanJobTolerations:'.
  # - key: "key1"
  #   operator: "Equal"
  #   value: "value1"
  #   effect: "NoSchedule"

  # -- scanJobNodeSelector nodeSelector to be applied to the scanner pods so that they can run on nodes with matching labels
  scanJobNodeSelector: 
    agentpool: default
  # -- If you do want to specify nodeSelector, uncomment the following lines, adjust them as necessary, and remove the
  # square brackets after 'scanJobNodeSelector:'.
  #   nodeType: worker
  #   cpu: sandylake
  #   teamOwner: operators

# -- tolerations set the operator tolerations
tolerations: 
  - key: "agentpool"
    operator: "Equal"
    value: "default" 
    effect: NoExecute

# -- affinity set the operator affinity
affinity: {}

nodeCollector:
  # -- useNodeSelector determine if to use nodeSelector (by auto detecting node name) with node-collector scan job
  useNodeSelector: true
    # -- excludeNodes comma-separated node labels that the node-collector job should exclude from scanning (example kubernetes.io/arch=arm64,team=dev)
  excludeNodes: "agentpool=connect"
chen-keinan commented 6 months ago

@KevinDW-Fluxys the 2 param we discussed toleration and affinity

KevinDW-Fluxys commented 6 months ago

@chen-keinan

@KevinDW-Fluxys the 2 param we discussed toleration and affinity

They are set as you can see in my previous reply, or should they be set differently?

chen-keinan commented 6 months ago

@KevinDW-Fluxys currently scanJobNodeSelector will work only for vulnerability scan-job not for node-collector, it require a small change to support it in node-collector. other then that setting looks ok

teimyBr commented 6 months ago

@KevinDW-Fluxys currently scanJobNodeSelector will work only for vulnerability scan-job not for node-collector, it require a small change to support it in node-collector. other then that setting looks ok

Would be very nice that the NodeSelector also supports Node-Collector

KevinDW-Fluxys commented 6 months ago

@chen-keinan

@KevinDW-Fluxys currently scanJobNodeSelector will work only for vulnerability scan-job not for node-collector, it require a small change to support it in node-collector. other then that setting looks ok

With this configuration i'm not getting any node-collector jobs spawned. If the config looks ok i'm not sure what is going wrong. Any idea what i can do to make my setup to work? The use case is in essence quite simple: make node collector run on nodes with a certain taint, and not on nodes with a specific label.

chen-keinan commented 6 months ago

@chen-keinan

@KevinDW-Fluxys currently scanJobNodeSelector will work only for vulnerability scan-job not for node-collector, it require a small change to support it in node-collector. other then that setting looks ok

With this configuration i'm not getting any node-collector jobs spawned. If the config looks ok i'm not sure what is going wrong. Any idea what i can do to make my setup to work? The use case is in essence quite simple: make node collector run on nodes with a certain taint, and not on nodes with a specific label.

to be honest I do not know, the right params are set, maybe something is conflicting

KevinDW-Fluxys commented 6 months ago

@chen-keinan After some more tests i noticed that the nodeselector on the job is being set to the hostname of a node that should not have been selected. I have taints and tolerations set to avoid this node, but still its hostname is added as a nodeselector. Since the job also has the correct taints its logical that it cant be scheduled.

nodeSelector:
    kubernetes.io/hostname: aks-connect-34676365-vmss000000

I am also running it on a second cluster where we only have 1 nodepool, and there the job is not being spawned at all. I'm not sure how the nodeselector is being generated, but probably the issue lies there.

i have also tried toggling the following parameter, but it does not seem to have any effect.

nodeCollector:
  # -- useNodeSelector determine if to use nodeSelector (by auto detecting node name) with node-collector scan job
  useNodeSelector: true
chen-keinan commented 6 months ago

@chen-keinan After some more tests i noticed that the nodeselector on the job is being set to the hostname of a node that should not have been selected. I have taints and tolerations set to avoid this node, but still its hostname is added as a nodeselector. Since the job also has the correct taints its logical that it cant be scheduled.

nodeSelector:
    kubernetes.io/hostname: aks-connect-34676365-vmss000000

I am also running it on a second cluster where we only have 1 nodepool, and there the job is not being spawned at all. I'm not sure how the nodeselector is being generated, but probably the issue lies there.

i have also tried toggling the following parameter, but it does not seem to have any effect.

nodeCollector:
  # -- useNodeSelector determine if to use nodeSelector (by auto detecting node name) with node-collector scan job
  useNodeSelector: true

setting this flag to false will not assign node-collector to each Node meaning node-collector potentially can be assign to same Node and not to every Node

KevinDW-Fluxys commented 6 months ago

@chen-keinan After some more tests i noticed that the nodeselector on the job is being set to the hostname of a node that should not have been selected. I have taints and tolerations set to avoid this node, but still its hostname is added as a nodeselector. Since the job also has the correct taints its logical that it cant be scheduled.

nodeSelector:
    kubernetes.io/hostname: aks-connect-34676365-vmss000000

I am also running it on a second cluster where we only have 1 nodepool, and there the job is not being spawned at all. I'm not sure how the nodeselector is being generated, but probably the issue lies there. i have also tried toggling the following parameter, but it does not seem to have any effect.

nodeCollector:
  # -- useNodeSelector determine if to use nodeSelector (by auto detecting node name) with node-collector scan job
  useNodeSelector: true

setting this flag to false will not assign node-collector to each Node meaning node-collector potentially can be assign to same Node and not to every Node

@chen-keinan Thanks, that makes sense. However it does not change the issue I'm having. Do you have any idea what might be wrong with the generation of the job and the setting of the hostname?

chen-keinan commented 6 months ago

@chen-keinan After some more tests i noticed that the nodeselector on the job is being set to the hostname of a node that should not have been selected. I have taints and tolerations set to avoid this node, but still its hostname is added as a nodeselector. Since the job also has the correct taints its logical that it cant be scheduled.

nodeSelector:
    kubernetes.io/hostname: aks-connect-34676365-vmss000000

I am also running it on a second cluster where we only have 1 nodepool, and there the job is not being spawned at all. I'm not sure how the nodeselector is being generated, but probably the issue lies there. i have also tried toggling the following parameter, but it does not seem to have any effect.

nodeCollector:
  # -- useNodeSelector determine if to use nodeSelector (by auto detecting node name) with node-collector scan job
  useNodeSelector: true

setting this flag to false will not assign node-collector to each Node meaning node-collector potentially can be assign to same Node and not to every Node

@chen-keinan Thanks, that makes sense. However it does not change the issue I'm having. Do you have any idea what might be wrong with the generation of the job and the setting of the hostname?

can you please share again the error you get when toleration is set, I'll try to investigate it

KevinDW-Fluxys commented 6 months ago

@chen-keinan Sure, that would be this:

Events:
  Type     Reason             Age    From                Message
  ----     ------             ----   ----                -------
  Warning  FailedScheduling   4m32s  default-scheduler   0/11 nodes are available: 2 node(s) had untolerated taint {agentpoo
l: connect}, 9 node(s) didn't match Pod's node affinity/selector. preemption: 0/11 nodes are available: 11 Preemption is not
 helpful for scheduling..
  Normal   NotTriggerScaleUp  4m31s  cluster-autoscaler  pod didn't trigger scale-up: 2 node(s) didn't match Pod's node affi
nity/selector, 1 node(s) had untolerated taint {agentpool: connect}

As some extra information/summary: We have 2 clusters:

All nodepools have a taint agentpool with their cluster-name.

When running with the above configuration on Cluster A, there are no node-collector jobs being generated. When running with the same configuration on cluster B, there is a job being created with a nodeSelector on a connect-node (see above) and a toleration for the default nodepool.

Therefor the error message makes sense: He wants to schedule on a connect node, but the taint is untolerated, the other nodes dont match the selector.

chen-keinan commented 6 months ago

@KevinDW-Fluxys do you mind also please share you node configuration so I could try to reproduce it.

kubectl get Node <node name> -o yaml

without exposing sensitive information

KevinDW-Fluxys commented 6 months ago

@chen-keinan You can find a redacted version of the nodes below. I think the main thing to do is to have only nodes with a taint, and then try to get the node-collector job to schedule on nodes with a specific taint / label. As said before, we have a cluster where we have only 1 taint, where no job spawns, and a cluster with both taints where a job spawns but with a nodeselector on the nodename of the pool where we dont want it.

apiVersion: v1
kind: Node
metadata:
  annotations:
    ...
  labels:
    agentpool: default
    name: aks-default-***-vmss000001
    ...
spec:
  ...
  taints:
  - effect: NoExecute
    key: agentpool
    value: default
status:
  ...
  nodeInfo:
    architecture: amd64
    bootID: ***
    containerRuntimeVersion: containerd://1.7.7-1
    kernelVersion: 5.15.0-1057-azure
    kubeProxyVersion: v1.28.5
    kubeletVersion: v1.28.5
    machineID: ***
    operatingSystem: linux
    osImage: Ubuntu 22.04.4 LTS
    systemUUID: ***
---
apiVersion: v1
kind: Node
metadata:
  annotations:
    ...
  labels:
    agentpool: connect
    name: aks-connect-***-vmss000001
    ...
spec:
  ...
  taints:
  - effect: NoExecute
    key: agentpool
    value: connect
status:
  ...
  nodeInfo:
    architecture: amd64
    bootID: ***
    containerRuntimeVersion: containerd://1.7.7-1
    kernelVersion: 5.15.0-1057-azure
    kubeProxyVersion: v1.28.5
    kubeletVersion: v1.28.5
    machineID: ***
    operatingSystem: linux
    osImage: Ubuntu 22.04.4 LTS
    systemUUID: ***
billimek commented 6 months ago

I found this issue because I had the same or similar issue where a node-collector pod was not scheduling to my control-plane (master) node with an error about no matching tolerations for the node's taint.

this commit (and possibly a restart of the privy operator pod) got this sorted and working after a few mins, FWIW.

From the helm configuration,

    trivyOperator:
      scanJobTolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Exists"
chen-keinan commented 6 months ago

@KevinDW-Fluxys any luck with @billimek suggestion ?

KevinDW-Fluxys commented 6 months ago

@chen-keinan I haven't tried it since it seems highly unlikely that this will work for us. His use case was getting something to run on the sytem nodepool, for which he had to tolerate a taint that was on the system nodepool. We are already tolerating our taints.

To be sure it wasn't a remark on the syntax I have updated my syntax to his syntax, and set it to check simply for the taint to exists (which is already a much wider toleration) but with the same result

scanJobTolerations:   
    - key: "agentpool"
      operator: "Exists"

I also want to point out that the vulnerability scanjobs are scheduling as expected, and responding to the tolerations as desired. Its only the nodecollector that isnt scheduling as expected

chen-keinan commented 6 months ago

This is because vulnerability scan-job don't care on which Node he it running and node collector does, if you unset the Node selector flag you'll get the same results

KevinDW-Fluxys commented 6 months ago

@chen-keinan As another addition, i noticed the following in the node-collector definition when it spawns:

apiVersion: v1
kind: Pod
metadata:
  name: node-collector-649bbb854f-s5swz
  namespace: trivy-operator
spec:
  volumes:
    ...
  containers:
    - name: node-collector
      image: ***/aquasecurity/node-collector:0.1.2
      command:
        - node-collector
      args:
        - k8s
        - '--node'
        - aks-connect-***-vmss000001

The container is being passed an arg with the name of the (wrong) node

chen-keinan commented 6 months ago

@chen-keinan As another addition, i noticed the following in the node-collector definition when it spawns:

apiVersion: v1
kind: Pod
metadata:
  name: node-collector-649bbb854f-s5swz
  namespace: trivy-operator
spec:
  volumes:
    ...
  containers:
    - name: node-collector
      image: ***/aquasecurity/node-collector:0.1.2
      command:
        - node-collector
      args:
        - k8s
        - '--node'
        - aks-connect-***-vmss000001

The container is being passed an arg with the name of the (wrong) node

does the --node value different from nodeSelector value ?

KevinDW-Fluxys commented 6 months ago

@chen-keinan As another addition, i noticed the following in the node-collector definition when it spawns:

apiVersion: v1
kind: Pod
metadata:
  name: node-collector-649bbb854f-s5swz
  namespace: trivy-operator
spec:
  volumes:
    ...
  containers:
    - name: node-collector
      image: ***/aquasecurity/node-collector:0.1.2
      command:
        - node-collector
      args:
        - k8s
        - '--node'
        - aks-connect-***-vmss000001

The container is being passed an arg with the name of the (wrong) node

does the --node value different from nodeSelector value ?

@chen-keinan no, the nodeSelector value is exactly the same. If the nodeSelector is the deciding factor, then the calculation of that one is wrong, since this is a node with an untolerated taint.

If you can point me to where this logic is in the code i can try to figure it out myself, although i'm not well versed in go, i could still give it a go :)

chen-keinan commented 6 months ago

@chen-keinan As another addition, i noticed the following in the node-collector definition when it spawns:

apiVersion: v1
kind: Pod
metadata:
  name: node-collector-649bbb854f-s5swz
  namespace: trivy-operator
spec:
  volumes:
    ...
  containers:
    - name: node-collector
      image: ***/aquasecurity/node-collector:0.1.2
      command:
        - node-collector
      args:
        - k8s
        - '--node'
        - aks-connect-***-vmss000001

The container is being passed an arg with the name of the (wrong) node

does the --node value different from nodeSelector value ?

@chen-keinan no, the nodeSelector value is exactly the same. If the nodeSelector is the deciding factor, then the calculation of that one is wrong, since this is a node with an untolerated taint.

If you can point me to where this logic is in the code i can try to figure it out myself, although i'm not well versed in go, i could still give it a go :)

once you define a toleration it will apply for all Node trivy-operator reconcile. you are suggesting to set toleration only for Nodes with matching taints ?

KevinDW-Fluxys commented 6 months ago

@chen-keinan Advised me to have a look at affinity and tolerations which might have been conflicting. I have removed the affinity, and now its working.

For anyone else wondering, my final configuration is:

trivyOperator:
  ...
  # -- scanJobAffinity affinity to be applied to the scanner pods and node-collector
  scanJobAffinity: []
  # -- scanJobTolerations tolerations to be applied to the scanner pods and node-collector so that they can run on nodes with matching taints
  scanJobTolerations:   
    - key: "agentpool"
      operator: "Exists"
  # -- scanJobNodeSelector nodeSelector to be applied to the scanner pods so that they can run on nodes with matching labels
  scanJobNodeSelector: 
    agentpool: default

This applies for both my scanjobs and node-collectors, but there is a PR which should be merged soon that will separate these two: https://github.com/aquasecurity/trivy-operator/pull/2006