bryanl / sheaf

Inspired by https://cnab.io/, sheaf manages bundles of Kubernetes components.
Apache License 2.0
29 stars 4 forks source link

Relaxed image replacement is too relaxed #30

Closed glyn closed 4 years ago

glyn commented 4 years ago

https://github.com/bryanl/sheaf/issues/24 introduced false positives which are perhaps a little too common for comfort.

For example, this simple deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-deployment
spec:
  template:
     spec:
      containers:
      - name: nginx
        image: nginx
        ports:
        - containerPort: 80

is transformed to:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-deployment
spec:
  template:
     spec:
      containers:
      - name: gcr.io/my-sandbox/library-nginx-dba37485fee3d4d76d5d82609cc9bccb
        image: gcr.io/my-sandbox/library-nginx-dba37485fee3d4d76d5d82609cc9bccb
        ports:
        - containerPort: 80

in which the container name has been corrupted.

glyn commented 4 years ago

One solution might be to treat the images found by scanning differently from those added using sheaf add-image. Those found by scanning could be replaced more stringently as the scanning logic knows how to find them and can avoid (most if not all) false positives. Those added using sheaf add-image could be replaced using a global change, but would still risk hitting false positives.

bryanl commented 4 years ago

We can remove a whole category of these types of issues by assuming that images will show up in predetermined locations. Trying to find an image will not work. For example, what's an example of a container image not showing up in something that doesn't look like a pod template spec?

glyn commented 4 years ago

Here are some examples: https://github.com/projectriff/k8s-manifest-scanner/tree/master/pkg/scan/testdata

glyn commented 4 years ago

We can indeed restrict the scanning logic to some fixed set of heuristics, but, to avoid false positives, the replacement logic had better follow the same heuristics.

Should we still support sheaf add-image as an escape route, as that seems prone to making the replacement logic hit false positives? Perhaps capturing some additional JSON paths (as you suggested a while back) would be a better escape route? (Do you know of any JSON path libraries that support "find and replace"?)

glyn commented 4 years ago

(Do you know of any JSON path libraries that support "find and replace"?)

@flimzy pointed me at gojq. Here's an example of using duck typing with find and replace.

@bryanl suggested the Kubernetes API machinery unstructured package and its SetNestedField function. This would require some way to determine the absolute path of a duck typed search.

@dprotaso suggested using go-yit as in this sample.

glyn commented 4 years ago

An experiment with go-yit turned up one wrinkle: deserialising, manipulating, and serialising YAML tends to modify the indentation and produce unexpectedly large diffs.

dprotaso commented 4 years ago

go-yit will at least preserve comments and most styling. You could try to infer indentation yourself or make the necessary changes to go-yaml.v3

glyn commented 4 years ago

It turns out go-yaml.v3 allows setting the indentation size to, say, 2. Then if the input YAML follows the same rule, the diff will be minimal.

I'm now exploring using a JSON path approach to find/replace so that we can support sheaf's user defined images.

glyn commented 4 years ago

I'm now exploring using a JSON path approach to find/replace so that we can support sheaf's user defined images.

See https://github.com/glyn/go-yamlpath.