canonical / charmed-kubeflow-workflows

Repository that containes GitHub workflows and shareable configs for Charmed Kubeflow
0 stars 0 forks source link

Create a re-usable workflow for oci-image scans #69

Open DnPlas opened 2 days ago

DnPlas commented 2 days ago

Context

As the team grows its offerings, security vulnerabilities must be scanned and report effectively so the team can addressed them in an appropriate time. Currently, the only repository that has a Github workflow for scanning oci-images and getting reports is canonical/bundle-kubeflow using the scan-images.yaml workflow. While working correctly at the moment, this workflow presents the following limitations:

  1. Uses a local script to gather the images used in all charm repositories that form the bundle. At the same, the get-all-images.py script depends on scripts present in each repository to generate a list of images per repo. The problem with this is that 1) not all repos have this script (e.g. mlflow), 2) this script is tightly coupled to the host repo.
  2. The Scan images step of the workflow depends on two scripts located at canonical/kubeflow-ci. This is problematic because 1) it creates a maintenance task, 2) they are doing something that actions like aquasecurity/trivy-action@0.20.0 are already providing.
  3. The workflow is not re-usable as is, meaning it cannot be used by mlflow-operator repository.

Proposal

Create a re-usable workflow for scanning oci-images that:

  1. Uses the aquasecurity/trivy-action@0.20.0 to scan and generate reports for each of the images under scan
  2. Uploads each of the Trivy reports as artefacts of the Github workflow run
  3. Automatically report vulnerabilities via Github issues in the rock's repository (i.e. canonical/training-operator-rocks, canonical/kubeflow-rocks)
  4. If the vulnerability for a specific image has already been reported, the workflow is smart enough to update an existing issue with the latest details of the report.
  5. Runs on schedule, but also provides a workflow dispatch

Please NOTE that part of this proposal is to only scan images that the Analytics team maintains. This is because the images that charms use that come from upstream cannot be patched by us.

Limitations

1. There is no other way of fetching the images that each charm uses, so for now we'll stick to using the get-all-images.py script.

  1. The Trivy reports will be uploaded individually, meaning that there is a linear relation between the number of scanned images and the number of artefacts saved in the workflow run. ~~3. BIGGEST This workflow will be coupling the product to the rocks, that is, the scans are done far from the source code. Ideally we'd have scans and vulnerability reports at the rockcraft project repositories. For this one, though, we could plan to push rocks to the oci-factory and outsource all the vulnerability scans and reports. ~~The workflows will live at rocks repo level, so this is not a limitation anymore.

Out of scope

  1. Automatic notifications in mailing list or MM
  2. snap or charm scans - though common workflows can be used in other automations, for example, for creating GH issues.

Example

  1. Scanning an image - this is an example run. The vulnerability scan job will fail if it founds a CRITICAL or HIGH vulnerability and it will report an issue.
  2. The workflow - this is how the workflow would look like, just with a bit of work to make it 100% product agnostic.
  3. Automatic issue creation - this is an example of an issue that will be created automatically by the workflow. It currently uses my GH token, that's why I'm the reporter, but ideally we'll use the CKF bot for it.

What needs to get done

Create a re-usable workflow for getting images used by any rock, scanning them for vulnerabilities, and reporting found vulns following the example in https://github.com/canonical/bundle-kubeflow/pull/1087/files#diff-327280cbc65c9de9998db8b0e5d1c937ccf75524907e5f9d026304ca85146f53

Definition of Done

There is a re-usable workflow that any of the charming products of this team can use.

syncronize-issues-to-jira[bot] commented 2 days ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-6331.

This message was autogenerated

DnPlas commented 1 day ago

Based on feedback from @misohu, the way to better approach this enhancement proposal is to have the scans closer to the source (each rock repository) instead of a central place. @misohu also pointed out that rocks are already being scanned on_push and on_pull by the canonical/charmed-kubeflow-workflows/.github/workflows/get-rocks-modified-and-build-scan-test-publish.yaml@main workflow, so scans are already happening at the rock level, but vulnerabilities are not being reported and not being constantly tested. I am editing the original proposal in the description of this issue to match the above.