Closed Jose-Matsuda closed 3 years ago
First step is pretty easy. Artifactory itself does scans of artifacts as they come in and when a new vulnerability is added to their database they trigger scans to look for it in any available artifacts.
You need to set up a watch on the respository(ies) you want to be looking at but after that you can run a quick call to the XRAY api via
curl -u myuser:passwordhere -X POST $URL -H "Content-Type: application/json" -d @4-violationscheck.json >> 4-violations.json
where the URL may look something like URL="https://testjosez.jfrog.io/xray/api/v1/violations"
and 4-violationscheck.json is a file like
{
"filters": {
"min_severity": "Critical"
}
}
Example output of that is found here, albeit messy but valid json.
With a quick bit of formatting
cat 4-violations.json | jq -c '.violations[].impacted_artifacts[]' | sort | uniq >> 4B-impacted-artifacts.txt
sed -i 's/\"//g' 4B-impacted-artifacts.txt
#Unsure if the default/ will exist in ours. Here's code to remove it anyways
awk '{gsub("default/","");print}' <<< cat 4B-impacted-artifacts.txt >> 4C-formatted-impacted-artifacts
# There's also a trailing slash for whatever reason at the end get rid of it
sed -i 's/.$//' 4C-formatted-impacted-artifacts
We get a file looking like
docker-quickstart-local/hello-world/latest
docker-quickstart-local/hello-world/vulnerablehope
Second task of getting images from the cluster is again pretty easy and the query will change depending on what you want.
If you want just notebook images (these are the curated images) then it is
kubectl get notebook --namespace jose-matsuda -o json | jq -c '.items[] | {Namespace:(.metadata.namespace), ImagePath:(.spec.template.spec.containers[0].image), Name:(.spec.template.spec.containers[0].name)}' | sort | uniq >> sometext.txt
where the jose-matsuda namespace would be substituted with all.
If you want to get a list of all images (this includes notebook images) then it is kubectl get pods --namespace jose-matsuda -o jsonpath="{.items[*].spec.containers[*].image}" | tr -s '[[:space:]]' '\n' | sort | uniq >> 2-kubectl-pod-images.txt
For the comparison:
Now if you notice in the get notebook, the ImagePath
looks like 'k8s/jupyterlab-cpu:shacode' and in the previous step getting the vulnerable images has the 'impacted_artifact' with all slashes, so we could just sed the notebook to replace the :
with a /
to make an easy way to compare.
Then you can go line by line in the parsed text file containing the impacted artifacts from step 1 and grep through the text file in step 3 to find any hits. Something like
sed 's/\//;/g' 2-kubectl-notebook.txt |
while read -r line
do
# extract the image from the file, trim the quotes, and replace the : with a ;
imageCheck=$(echo $line | jq -c '.ImagePath' | tr -d '"' | sed 's/:/;/g')
# Look for the image in the imapacted artifacts and if found print the line to the list.
if grep -Fxq "$imageCheck" 4C-formatted-impacted-artifacts.txt
then
echo $line >> user-items.txt
fi
done
For the Curated
and User-workload images
persisting data, may have the path of using Property Sets
in Artifactory as those are confirmed to be in the local repository (Property sets are only available on artifacts stored in those). This assumes that those are the only kinds of images we need to worry about in terms of updating / deleting for vulnerability. Could likely just attach a "Delete-on-date" property that contains a YMD (20210421) and use that in the program
If we must also delete images that are proxied (these tend to be platform images as those use the upstream versions) then property sets may be difficult to use and it may be better to use a different approach.
Thinking more about this property sets may not be the greatest way to go? Can have multiple images in cluster relating to one image in Artifactory. I can get emails going on when first detected, but not too sure about how to handle cases if we do want to say "X container used by Y user is being held + put off for deletion". Then again not too sure how to incorporate the "hold off on deletion for x days" anyways.
Found a decently 'nice' way of persisting data without the need for a PV.
1) The 'date' for an image to be deleted we can use properties
in Artifactory, since this would be the same across all images.
2) A notebook is on 'hold' status (TBD)
We can add a label
to the notebook
I've checked and these persist past notebook restarts. Will need to make sure that when you do go to "OnHold" the option of
--overwrite
is true. Will also need to make sure that when the notebook is updated, that the "OnHold" label goes to false.
One of the bullet points in https://github.com/StatCan/daaas/issues/461 and this step can be done anytime because regardless of what happens next we need this information.
Using Artifactory, get a list of vulnerable images and compare them against images in the cluster. The information from this step should probably include the following;
namespace, image-name, container-name, date-found
wheredate-found
is the day the image was found and tagged to be vulnerable, not the day the vulnerability itself first existed. This must then be persisted somewhere.Logical steps (I have solutions for the first three, the persisting one will take me a bit of time)
OF NOTE.
This step is not looking at any flat-out unused images. That would use 'list of images from cluster' and then 'full list of images older than X time' artifactory.
Also, with the 'smaller' demo scope of the two actions, (meaning it is just done). The 'date' and 'on hold' aspect may not be very important moving on. The 'date' can still be useful for if we do the delete the next business day or something at least and we have the information on how to do it