AICoE / integration-demo-summit-2022

Summit 2022 OCTO Keynote
GNU General Public License v3.0
2 stars 8 forks source link

Investigate multi-target DevSecOps pipeline #63

Closed jimcadden closed 2 years ago

jimcadden commented 2 years ago

We need a way for the Data Scientist to easily trigger a desired DevSecOps pipeline from the notebook (e.g., dev or prod, x86 or ARM, etc.)

This will likely require:

jimcadden commented 2 years ago

Any initial thoughts? @goern @harshad16 @cooktheryan

jimcadden commented 2 years ago

Related: #62

cooktheryan commented 2 years ago

we could potentially build multi-arch images and have them both stored in quay and defined in a manifest. Similar to this https://quay.io/repository/microshift/microshift/manifest/sha256:13c1eae4b2730c1e8718f0ca06ff1d5ffec1a3e8b11f3f9c30b163020d49f0c8

I don't know how image signing would react and the other caveat is that sometimes cross architecture builds take a very long time

cooktheryan commented 2 years ago

Some more notes https://www.docker.com/blog/multi-arch-build-and-images-the-simple-way/

cooktheryan commented 2 years ago

So buildah has an option to specify arch.

buildah bud --arch arm64 --tag quay.io/rcook/tools:aarm .
buildah bud --tag quay.io/rcook/tools:amd64 .

which produces

rcook (again/api-ci-ln-jt8fh82-72292-origin-ci-int-gce-dev-rhcloud-com:6443/system:admin)  @ ~/git/summit-2021-octo-keynote
main└─ $ podman inspect quay.io/rcook/tools:amd64 | grep -i arch
        "Architecture": "amd64",
rcook (again/api-ci-ln-jt8fh82-72292-origin-ci-int-gce-dev-rhcloud-com:6443/system:admin)  @ ~/git/summit-2021-octo-keynote
main└─ $ podman inspect quay.io/rcook/tools:aarm | grep -i arch
        "Architecture": "arm64",

Our builder image just needs to have qemu-user-static if the output above is cool with everyone then you can assign to me

harshad16 commented 2 years ago

This is something we would have to test. As Ryan suggested we can set the arch in the build methods, so this can be included in the pipeline as well. However, I have seen in the past that the node should have these arch available to allow pods to do such actions. we can test and then make it modular for the selection of the arch based on user methods.

goern commented 2 years ago

I just wonder if encoding the arch as a tag is a good idea, don't we have multi-arch repos on quay? and just a little bit of detailing, it's aarch64 and not aarm? maybe this is normative? https://github.com/opencontainers/image-spec/blob/fe0a24978a6629f4b7159928e538dda36c7cec8e/schema/validator.go#L206

otherwise /lgtm

jimcadden commented 2 years ago

The donkeycar environments differ for the the Rasberry-Pi and Jetson car models, so we will likely need multiple "ARM" image build targets. Is this possible with buildah?

cooktheryan commented 2 years ago

@jimcadden I don't believe there will be issues with buildah as long as we can extend the pipelines. We may have to checkout specific branches or define something at runtime to let us know which image we are actually building. Providing the arch at buildah runtime shouldn't be a huge issue either

jimcadden commented 2 years ago

The simplest build solution may be to head-of-time construct three complete, fedora:35-base donkeycar images (x86, arm64-rpi, arm64-jetson). Then, our "shortcut" build pipeline(s) would consists simply of copying in the updated configuration and model into containers based off the original three (using qemu-user-static for the arm64 container builds).

What this solution doesn't provide is a complete CI-driven build of the full donkey environment and application.

Thoughts?

cooktheryan commented 2 years ago

what about having an "extended pipeline" which would build the x86, arm64-repi, and jetson base images then subsequently kick off the shortcut build? So our base donkey car image builds for each of the architectures and waits for completion of all base builds and at the end of the run we use a github bot account to generate the tag to fire the creation of configuration/model builds using the new base

We potentially would only need to fully redo our base images in the event there were patches to the underlying application software or security vulnerabilities that would require us to kick off a rebuild of the base image. Maybe something weekly or quarterly.

jimcadden commented 2 years ago

what about having an "extended pipeline" which would build the x86, arm64-repi, and jetson base images then subsequently kick off the shortcut build? So our base donkey car image builds for each of the architectures and waits for completion of all base builds and at the end of the run we use a github bot account to generate the tag to fire the creation of configuration/model builds using the new base

We potentially would only need to fully redo our base images in the event there were patches to the underlying application software or security vulnerabilities that would require us to kick off a rebuild of the base image. Maybe something weekly or quarterly.

@cooktheryan that makes sense to me.

@harshad16 how much work would it be to get a pipeline like Ryan described working in the aicoe-ci (or via custom tekton chain)?

harshad16 commented 2 years ago

@jimcadden i m going to try Ryan's today. It should take a day to get the solution into the pipeline, it would take time to actually test it ,i will follow up with you and Ryan on the arm base app, as we would need to the base image.

some clarifications question: @cooktheryan @jimcadden can we have multiple dockerfile for different arch ? or are we plan on having same dockerfile and multiple base image based on arch?

jimcadden commented 2 years ago

can we have multiple dockerfile for different arch ? or are we plan on having same dockerfile and multiple base image based on arch?

@harshad16 would one of the two options help simplify your task? I think either option is possible for us. @cooktheryan do you agree?

cooktheryan commented 2 years ago

agree. Either option is possible whether we do if statements based on arch from within the container when we run the scripts or just separate directories within the repository

harshad16 commented 2 years ago

oh i was going to design the pipeline re structure based on this choice. i think multiple dockerfile could be better to do at first , so we can experiment a little. will add this one and we can talk about it.

cooktheryan commented 2 years ago

@harshad16 @jimcadden I think once we get the aarch Dockerfile + base image assets defined we can be a little ugly at first then work our way towards a more fluid pipeline through iterations

jimcadden commented 2 years ago

I've constructed an aarch base image that contains the RPi donkeycar (docker pull jcaddenibm/donkeycar_fedora:arm64) and added a new Dockerfile to the app repo: https://github.com/AICoE/summit-2021-octo-keynote/pull/66. I think this should be enough to start the aarch build pipeline.

Next step, I'll add the base container build & fedora donkeycar (currently here & here) into the summit-*-keynote repo so that we can kick off a rebuild of the base images.

harshad16 commented 2 years ago

This image is built with arm Dockerfile

podman inspect quay.io/aicoe/summit-2021-octo-keynote:pr-67 | grep -i arch
        "Architecture": "arm64",

This image is built with Dockerfile

podman inspect quay.io/aicoe/summit-2021-octo-keynote:v0.3.1 | grep -i arch
        "Architecture": "amd64",
sesheta commented 2 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

harshad16 commented 2 years ago

The multi-target pipeline is in place and is working well. This can be closed.

sesheta commented 2 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

sesheta commented 2 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

sesheta commented 2 years ago

@sesheta: Closing this issue.

In response to [this](https://github.com/AICoE/integration-demo-summit-2022/issues/63#issuecomment-1179379849): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.