CrowdStrike / falcon-operator

https://artifacthub.io/packages/olm/falcon-operator/falcon-operator
Apache License 2.0
43 stars 33 forks source link

Container Sensor can't be deployed in GKE Autopilot #479

Closed ivanaguilario closed 5 months ago

ivanaguilario commented 5 months ago

I'm trying to deploy the FalconContainer resource into a GKE Autopilot cluster but seems like Autopilot admission controllers reject the generated deployment.

Here's the manifest I'm trying to deploy:

apiVersion: falcon.crowdstrike.com/v1alpha1
kind: FalconContainer
metadata:
  name: falcon-sidecar-sensor
  namespace: falcon-operator
spec:
  falcon:
    cid: '${crowdstrike_cid}'
    tags:
      - '${crowdstrike_company_tag}'
    trace: info
  falcon_api:
    cid: '${crowdstrike_cid}'
    client_id: '${crowdstrike_client_id}'
    client_secret: '${crowdstrike_client_secret}'
    cloud_region: autodiscover
  registry:
    type: gcr
  injector:
    replicas: 2
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 100m
        memory: 128Mi
    sensorResources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 100m
        memory: 128Mi

And I'm getting the following error:

failed to reconcile injector Deployment: failed to create Deployment falcon-sidecar-injector in namespace falcon-system: admission webhook "warden-validating.common-webhooks.networking.gke.io" denied the request: GKE Warden rejected the request because it violates one or more constraints.
Violations details: {"[denied by autogke-node-affinity-selector-limitation]":["Key 'node-role.kubernetes.io/master' is not allowed with node affinity; Autopilot only allows labels with keys: cloud.google.com/compute-class,cloud.google.com/gke-spot,cloud.google.com/gke-placement-group,topology.kubernetes.io/region,topology.kubernetes.io/zone,failure-domain.beta.kubernetes.io/region,failure-domain.beta.kubernetes.io/zone,cloud.google.com/gke-os-distribution,kubernetes.io/os,kubernetes.io/arch,cloud.google.com/private-node,sandbox.gke.io/runtime,cloud.google.com/gke-accelerator,cloud.google.com/gke-accelerator-count,iam.gke.io/gke-metadata-server-enabled."]}
Requested by user: 'system:serviceaccount:falcon-operator:falcon-operator-controller-manager', groups: 'system:serviceaccounts,system:serviceaccounts:falcon-operator,system:authenticated'.

Any ideas on what might be going on?

My guess is the deployment created by the operator is adding some tolerations or affinity selectors containing node-role.kubernetes.io/master but I don't see any way to remove them using the FalconContainer resource.

redhatrises commented 5 months ago

Hello,

The sidecar shouldn't be used for GKE autopilot. Please use the node sensor instead. Example config:

apiVersion: falcon.crowdstrike.com/v1alpha1
kind: FalconNodeSensor
metadata:
  name: falcon-node-sensor
  namespace: falcon-operator
spec:
  falcon:
    tags:
      - '${crowdstrike_company_tag}'
    trace: info
  falcon_api:
    client_id: '${crowdstrike_client_id}'
    client_secret: '${crowdstrike_client_secret}'
    cloud_region: autodiscover
  node:
    backend: bpf
    gke:
      autopilot: true
    resources:
      requests:
        cpu: <min 250m | default 750m>
        memory: <min 500Mi | default 1.5Gi>
    tolerations:
      - effect: NoSchedule
        operator: Equal
        key: kubernetes.io/arch
        value: amd64
ivanaguilario commented 5 months ago

Hi @redhatrises thanks for your response.

That's what we initially tried but the Node sensor is generating a lot of errors in the logs. From this comment, I assume GCOS is not supported (not sure if there's an update on that).

GKE Autopilot only uses GCOS, and there's no way to change it currently to any other node image unfortunately.

Is that still the case, or are the errors normal and expected? I've attached a log file to the comment so you can see the errors.

Thanks.

redhatrises commented 5 months ago

I have removed the logs as sensor logs shouldn't be provided via this forum.

I assume GCOS is not supported (not sure if there's an update on that).

GCOS has been supported for a while now using eBPF (not kernel mode). I would check with support if concerned about the log messages as some errors are benign.

For autopilot, the node sensor method should be used instead of the sidecar. https://cloud.google.com/kubernetes-engine/docs/resources/autopilot-partners#allowlisted-partner-workloads

ivanaguilario commented 5 months ago

@redhatrises thank you very much for your answers, that's more clear now!

I'll close the issue as it seems that it is indeed working, I was just missing some info about it.

Thanks!