anchore / yardstick

Compare vulnerability scanners results (to make them better!)
Apache License 2.0
14 stars 4 forks source link

yardstick should know about quality gates #126

Open willmurphyscode opened 1 year ago

willmurphyscode commented 1 year ago

What would you like to be added:

Right now, yardstick can capture the difference between different tool outputs, but it has no notion of a label provider, and tools that compare yardstick results to known labels implement two mechanisms on their own:

  1. A mechanism to fetch some labels to use in comparison. (grype uses git submodules for this mechanism.)
  2. A mechanism to run a quality gate, for example so that too great a deviation from known labels fails a job in CI. Grype's gate.py is an example.

The core request is: a yardstick.yaml file should be able to configure two additional things: where do my labels come from? And what comparison methods and thresholds are considered success vs failure.

One way we've discussed this is: a label source and a comparator should be sort of like "tools" in yardstick's config model.

Why is this needed:

Right now, we're maintaining a fair amount of python code in different repos that fetches labels and runs different comparisons. All of these comparison codes fetch labels and run comparisons in similar, but not identical ways. We should promote this duplicated code into a new configurable section in yardstick.

Additional context:

willmurphyscode commented 11 months ago

Currently there are quality gates in grype and vunnel:

Recently, both gates had a failure mode where they caused unexpected false positives. That is, the gate.py was exiting zero but should not have been. This seems like evidence that the quality gate needs to be better tested, but setting up testing for once-off python scripts in different repos isn't sustainable. The ask here is that the functionality implemented by gate.py in these two repos (and possibly others I missed) should be made into a CLI command in yardstick.

Right now, the comparisons used are different, but this difficulty seems surmountable for 2 reasons: first, some of the differences are accidental, and 2, we could employ a strategy pattern or plugin model for differences that couldn't be refactored out.