ComplianceAsCode / compliance-operator

Operator providing Kubernetes cluster compliance checks
Apache License 2.0
36 stars 22 forks source link

OCPBUGS-19690: Enable host network to access host sysctls #497

Closed yuumasato closed 1 month ago

yuumasato commented 5 months ago

EDIT: I have re-tested and DNSPolicy: ClusterFirstWithHostNet indeed solves the no such host error when trying to upload to resultserver.

openshift-ci-robot commented 5 months ago

@yuumasato: This pull request references Jira Issue OCPBUGS-19690, which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/ComplianceAsCode/compliance-operator/pull/497): >* With `HostNetwork: true` the sysctl `net.core.bpf_jit_harden` becomes visible to the `scanner` container. > Below is a pod that has access to the sysctls: >```yaml >apiVersion: v1 >kind: Pod >metadata: > name: list-sysctls >spec: > hostNetwork: true > volumes: > - name: host > hostPath: > path: / > type: Directory > containers: > - name: list > command: > - cat > - /host/proc/sys/net/ipv6/conf/all/accept_ra > - /host/proc/sys/net/core/bpf_jit_harden > image: registry.access.redhat.com/ubi8/ubi-minimal > securityContext: > runAsUser: 0 > privileged: true > volumeMounts: > - name: host > mountPath: /host >``` >`$ oc create -f list-syctls-proc.yaml` >`$ oc logs list-sysctls ` > >* But with `HostNetwork: true`, the CO fails to upload to `resultserver`. >` >{"level":"info","ts":"2024-03-15T18:45:57Z","logger":"cmd","msg":"Trying to upload to resultserver","url":"https://upstream-rhcos4-high-worker-rs:8443/"} >{"level":"error","ts":"2024-03-15T18:45:57Z","logger":"cmd","msg":"Failed to upload results to server","error":"Post \"https://upstream-rhcos4-high-worker-rs:8443/\": dial tcp: lookup upstream-rhcos4-high-worker-rs on 10.0.0.2:53: no such host","stacktrace":"github.com/ComplianceAsCode/compliance-operator/cmd/manager.uploadToResultServer.func1\n\tgithub.com/ComplianceAsCode/compliance-operator/cmd/manager/resultcollector.go:316\ngithub.com/cenkalti/backoff/v4.RetryNotifyWithTimer.Operation.withEmptyData.func1\n\tgithub.com/cenkalti/backoff/v4@v4.2.1/retry.go:18\ngithub.com/cenkalti/backoff/v4.doRetryNotify[...]\n\tgithub.com/cenkalti/backoff/v4@v4.2.1/retry.go:88\ngithub.com/cenkalti/backoff/v4.RetryNotifyWithTimer\n\tgithub.com/cenkalti/backoff/v4@v4.2.1/retry.go:61\ngithub.com/cenkalti/backoff/v4.RetryNotify\n\tgithub.com/cenkalti/backoff/v4@v4.2.1/retry.go:49\ngithub.com/cenkalti/backoff/v4.Retry\n\tgithub.com/cenkalti/backoff/v4@v4.2.1/retry.go:38\ngithub.com/ComplianceAsCode/compliance-operator/cmd/manager.uploadToResultServer\n\tgithub.com/ComplianceAsCode/compliance-operator/cmd/manager/resultcollector.go:299\ngithub.com/ComplianceAsCode/compliance-operator/cmd/manager.handleCompleteSCAPResults.func1\n\tgithub.com/ComplianceAsCode/compliance-operator/cmd/manager/resultcollector.go:390"} >` >* `DNSPolicy: ClusterFirstWithHostNet` is my unsuccessful attempt to fix that. > > > Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=ComplianceAsCode%2Fcompliance-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
Vincent056 commented 5 months ago

nice finding!

BhargaviGudi commented 5 months ago

/hold for test

BhargaviGudi commented 4 months ago

Verification passed with 4.16.0-0.nightly-2024-04-16-195622 + compliance-operator with PR #497 code + PR #11722 code

  1. Install CO
    $ oc get pb
    NAME              CONTENTIMAGE                                 CONTENTFILE         STATUS
    ocp4              ghcr.io/complianceascode/k8scontent:latest   ssg-ocp4-ds.xml     VALID
    rhcos4            ghcr.io/complianceascode/k8scontent:latest   ssg-rhcos4-ds.xml   VALID
    upstream-ocp4     ghcr.io/complianceascode/k8scontent:11722    ssg-ocp4-ds.xml     VALID
    upstream-rhcos4   ghcr.io/complianceascode/k8scontent:11722    ssg-rhcos4-ds.xml   VALID
  2. Create custom wrscan
  3. create auto-rem-ss to scan wrscan mcp rule only
    $ oc get ss auto-rem-ss -oyaml
    apiVersion: compliance.openshift.io/v1alpha1
    autoApplyRemediations: true
    autoUpdateRemediations: true
    kind: ScanSetting
    maxRetryOnTimeout: 3
    metadata:
    annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"compliance.openshift.io/v1alpha1","autoApplyRemediations":true,"autoUpdateRemediations":true,"kind":"ScanSetting","maxRetryOnTimeout":3,"metadata":{"annotations":{},"creationTimestamp":"2023-09-25T02:05:43Z","generation":1,"name":"auto-rem-ss","namespace":"openshift-compliance","resourceVersion":"43973","uid":"29426481-7cd1-48f0-a3cf-934c96f651eb"},"rawResultStorage":{"pvAccessModes":["ReadWriteOnce"],"rotation":5,"size":"2Gi"},"roles":["wrscan"],"scanTolerations":[{"operator":"Exists"}],"schedule":"0 1 * * *","showNotApplicable":false,"strictNodeScan":false,"timeout":"30m"}
    creationTimestamp: "2024-04-17T10:14:11Z"
    generation: 1
    name: auto-rem-ss
    namespace: openshift-compliance
    resourceVersion: "108142"
    uid: b3a50385-baad-43cf-8ac3-2fb3f1c502a6
    rawResultStorage:
    pvAccessModes:
    - ReadWriteOnce
    rotation: 5
    size: 2Gi
    roles:
    - wrscan
    scanTolerations:
    - operator: Exists
    schedule: 0 1 * * *
    showNotApplicable: false
    strictNodeScan: false
    suspend: false
    timeout: 30m
  4. Create ssb
    $ oc compliance bind -N rhcos4-high-test -S auto-rem-ss profile/upstream-rhcos4-high
    Creating ScanSettingBinding rhcos4-high-test
    $ oc get scan
    NAME                 PHASE   RESULT
    upstream-rhcos4-high-wrscan   DONE    NON-COMPLIANT
  5. All the rules with auto-remediations are applied after 3 rounds are rescan.
    $ oc compliance rerun-now scansettingbinding rhcos4-high-test
    Rerunning scans from 'rhcos4-high-test': upstream-rhcos4-high-wrscan
    Re-running scan 'openshift-compliance/upstream-rhcos4-high-wrscan'
    $ oc get ccr -l compliance.openshift.io/automated-remediation=,compliance.openshift.io/check-status=FAIL  
    No resources found in openshift-compliance namespace.
BhargaviGudi commented 4 months ago

/unhold

BhargaviGudi commented 4 months ago

/label qe-approved

BhargaviGudi commented 4 months ago

/lgtm

yuumasato commented 4 months ago

@BhargaviGudi Thank you for testing this.

I re-tested again and cannot reproduce the error I had mentioned in PR description. I was probably doing something wrong before.

yuumasato commented 4 months ago

Below are some of the runtime objects collected, they match the static configuration now.

<unix-sys:sysctl_item id="100008715" status="exists">
  <unix-sys:name>net.core.bpf_jit_harden</unix-sys:name>
  <unix-sys:value>2</unix-sys:value>
</unix-sys:sysctl_item>
<ind-sys:textfilecontent_item id="100008714" status="exists">
  <ind-sys:filepath>/etc/sysctl.d/75-sysctl_net_core_bpf_jit_harden.conf</ind-sys:filepath>
  <ind-sys:path>/etc/sysctl.d</ind-sys:path>
  <ind-sys:filename>75-sysctl_net_core_bpf_jit_harden.conf</ind-sys:filename>
  <ind-sys:pattern>^[\s]*net.core.bpf_jit_harden[\s]*=[\s]*(.*)[\s]*$</ind-sys:pattern>
  <ind-sys:instance datatype="int">1</ind-sys:instance>
  <ind-sys:line>^[\s]*net.core.bpf_jit_harden[\s]*=[\s]*(.*)[\s]*$</ind-sys:line>
  <ind-sys:text>net.core.bpf_jit_harden=2</ind-sys:text>
  <ind-sys:subexpression>2</ind-sys:subexpression>
</ind-sys:textfilecontent_item>
<unix-sys:sysctl_item id="100008621" status="exists">
  <unix-sys:name>net.ipv6.conf.default.accept_ra</unix-sys:name>
  <unix-sys:value>0</unix-sys:value>
</unix-sys:sysctl_item>
<ind-sys:textfilecontent_item id="100008620" status="exists">
  <ind-sys:filepath>/etc/sysctl.d/75-sysctl_net_ipv6_conf_default_accept_ra.conf</ind-sys:filepath>
  <ind-sys:path>/etc/sysctl.d</ind-sys:path>
  <ind-sys:filename>75-sysctl_net_ipv6_conf_default_accept_ra.conf</ind-sys:filename>
  <ind-sys:pattern>^[\s]*net.ipv6.conf.default.accept_ra[\s]*=[\s]*(.*)[\s]*$</ind-sys:pattern>
  <ind-sys:instance datatype="int">1</ind-sys:instance>
  <ind-sys:line>^[\s]*net.ipv6.conf.default.accept_ra[\s]*=[\s]*(.*)[\s]*$</ind-sys:line>
  <ind-sys:text>net.ipv6.conf.default.accept_ra=0</ind-sys:text>
  <ind-sys:subexpression>0</ind-sys:subexpression>
</ind-sys:textfilecontent_item>
openshift-ci[bot] commented 2 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BhargaviGudi, Vincent056, yuumasato

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/ComplianceAsCode/compliance-operator/blob/master/OWNERS)~~ [BhargaviGudi,Vincent056] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
yuumasato commented 2 months ago

/jira refresh

openshift-ci-robot commented 2 months ago

@yuumasato: This pull request references Jira Issue OCPBUGS-19690, which is invalid:

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to [this](https://github.com/ComplianceAsCode/compliance-operator/pull/497#issuecomment-2181077273): >/jira refresh Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=ComplianceAsCode%2Fcompliance-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
yuumasato commented 2 months ago

/jira refresh

openshift-ci-robot commented 2 months ago

@yuumasato: This pull request references Jira Issue OCPBUGS-19690, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.17.0) matches configured target version for branch (4.17.0) * bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @xiaojiey

In response to [this](https://github.com/ComplianceAsCode/compliance-operator/pull/497#issuecomment-2181106967): >/jira refresh Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=ComplianceAsCode%2Fcompliance-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
rhmdnd commented 1 month ago

The ROSA failure here looks like a provisioning/setup issue before the test even runs. Attempting to recheck since I'm not convinced the failure is due to this patch.

rhmdnd commented 1 month ago

/test e2e-rosa

yuumasato commented 1 month ago

Rebased to latest master, lets see how testing goes.

github-actions[bot] commented 1 month ago

:robot: To deploy this PR, run the following command:

make catalog-deploy CATALOG_IMG=ghcr.io/complianceascode/compliance-operator-catalog:497
Vincent056 commented 1 month ago

/lgtm

openshift-ci-robot commented 1 month ago

@yuumasato: Jira Issue OCPBUGS-19690: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-19690 has been moved to the MODIFIED state.

In response to [this](https://github.com/ComplianceAsCode/compliance-operator/pull/497): >* With `HostNetwork: true` the sysctl `net.core.bpf_jit_harden` becomes visible to the `scanner` container. > Below is a pod that has access to the sysctls: >```yaml >apiVersion: v1 >kind: Pod >metadata: > name: list-sysctls >spec: > hostNetwork: true > volumes: > - name: host > hostPath: > path: / > type: Directory > containers: > - name: list > command: > - cat > - /host/proc/sys/net/ipv6/conf/all/accept_ra > - /host/proc/sys/net/core/bpf_jit_harden > image: registry.access.redhat.com/ubi8/ubi-minimal > securityContext: > runAsUser: 0 > privileged: true > volumeMounts: > - name: host > mountPath: /host >``` >`$ oc create -f list-syctls-proc.yaml` >`$ oc logs list-sysctls ` > >* `DNSPolicy: ClusterFirstWithHostNet` allows the CO to upload to `resultserver`, otherwise we get the following error: >` >{"level":"info","ts":"2024-03-15T18:45:57Z","logger":"cmd","msg":"Trying to upload to resultserver","url":"https://upstream-rhcos4-high-worker-rs:8443/"} >{"level":"error","ts":"2024-03-15T18:45:57Z","logger":"cmd","msg":"Failed to upload results to server","error":"Post \"https://upstream-rhcos4-high-worker-rs:8443/\": dial tcp: lookup upstream-rhcos4-high-worker-rs on 10.0.0.2:53: no such host","stacktrace":"github.com/ComplianceAsCode/compliance-operator/cmd/manager.uploadToResultServer.func1\n\tgithub.com/ComplianceAsCode/compliance-operator/cmd/manager/resultcollector.go:316\ngithub.com/cenkalti/backoff/v4.RetryNotifyWithTimer.Operation.withEmptyData.func1\n\tgithub.com/cenkalti/backoff/v4@v4.2.1/retry.go:18\ngithub.com/cenkalti/backoff/v4.doRetryNotify[...]\n\tgithub.com/cenkalti/backoff/v4@v4.2.1/retry.go:88\ngithub.com/cenkalti/backoff/v4.RetryNotifyWithTimer\n\tgithub.com/cenkalti/backoff/v4@v4.2.1/retry.go:61\ngithub.com/cenkalti/backoff/v4.RetryNotify\n\tgithub.com/cenkalti/backoff/v4@v4.2.1/retry.go:49\ngithub.com/cenkalti/backoff/v4.Retry\n\tgithub.com/cenkalti/backoff/v4@v4.2.1/retry.go:38\ngithub.com/ComplianceAsCode/compliance-operator/cmd/manager.uploadToResultServer\n\tgithub.com/ComplianceAsCode/compliance-operator/cmd/manager/resultcollector.go:299\ngithub.com/ComplianceAsCode/compliance-operator/cmd/manager.handleCompleteSCAPResults.func1\n\tgithub.com/ComplianceAsCode/compliance-operator/cmd/manager/resultcollector.go:390"} >` > >* Use the content from https://github.com/ComplianceAsCode/content/pull/11722, to check whether the `scanner` container can access the sysctls correctly. > ` oc compliance bind -S default-auto-apply -N test profile/upstream-rhcos4-moderate` > >EDIT: I have re-tested and `DNSPolicy: ClusterFirstWithHostNet` indeed solves the `no such host` error when trying to upload to `resultserver`. Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=ComplianceAsCode%2Fcompliance-operator). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.