StatCan / aaw-contrib-containers

Containers to be used for general purpose Data Science
Other
3 stars 14 forks source link

feat(security-scanning): image for scanning #87

Closed Jose-Matsuda closed 2 years ago

Jose-Matsuda commented 2 years ago

Temporary measure to bypass credential setting on new repo. Does not do any updates, will just print data to console.

Dependent on https://github.com/StatCan/daaas/issues/960 as step b will not actually pull the image in it's entirety, it will just be a .marker file, and will thus not be scanned by XRAY.

Jose-Matsuda commented 2 years ago

Yeah I agree it's a bit hard to follow, its a lot easier when you look at the outputs of each step. There is some example output here https://github.com/Jose-Matsuda/artifactory-cleanup/tree/main/2022/z-Example-outputs (though it might not match completely anymore, this was the general idea).

At this point the the outputs would just be the affected images, or images not found.

vexingly commented 2 years ago

Yeah I agree it's a bit hard to follow, its a lot easier when you look at the outputs of each step. There is some example output here https://github.com/Jose-Matsuda/artifactory-cleanup/tree/main/2022/z-Example-outputs (though it might not match completely anymore, this was the general idea).

At this point the the outputs would just be the affected images, or images not found.

Those are really helpful and interesting, thanks! One thing from the example that I am wondering about, any concerns when it picks up an artifact using the 'latest' tag? Could we get into a scenario where the vulnerability report 'latest' is not the same as the one in a notebook? Not sure if its a valid concern or not possible in an actual notebook.

Jose-Matsuda commented 2 years ago

Yeah I agree it's a bit hard to follow, its a lot easier when you look at the outputs of each step. There is some example output here https://github.com/Jose-Matsuda/artifactory-cleanup/tree/main/2022/z-Example-outputs (though it might not match completely anymore, this was the general idea). At this point the the outputs would just be the affected images, or images not found.

Those are really helpful and interesting, thanks! One thing from the example that I am wondering about, any concerns when it picks up an artifact using the 'latest' tag? Could we get into a scenario where the vulnerability report 'latest' is not the same as the one in a notebook? Not sure if its a valid concern or not possible in an actual notebook.

Very Valid concern. So after the discussion we had with https://github.com/StatCan/daaas/issues/961, we did decide to eventually move to a pattern that will actually have a long lived tag (similar to latest). And using that tag, we will also have say weekly updates to the everyone's notebook images to get on the latest version of what we have deployed in aaw-kubeflow-containers. Admittedly, before this and the research I was not considering using a long lasting tag.

With the long lived tag (say a v1) when we push say jupyterlab-cpu:v1 to the ACR, it will clobber and annihilate the previous jupyterlab-cpu:v1 tag which makes me now see where you are coming from. There can be a case where images on a notebook's pods can be different from what we pull from the ACR. For example, on a Monday we fix a critical CVE on kubeflow-containers and push to main and that will update the v1 tag. Then this cronjob runs nightly and then bam it pulls the jupyterlab-cpu:v1 and while scanning that for CVE's.

For our current implementation it is not a problem, but when we do move towards that we will need to consider pulling by the digest and not the tag dockerimage@sha256: