Support multiple outputs in a single run

jawnsy commented 1 year ago

Thanks so much for producing and maintaining this excellent tool!

Summary

When running in build systems, it would be convenient to generate a report to output the UI as well as save a report to a file, sometimes also in a different format.

Current behavior

Only one format/output pair can be specified, so we can output to a table or JSON in a given trivy run, but not both. Additionally, we can output results to the terminal or to a file, but not both.

Desired behavior

Configure format/file as a single variable, and allow multiple such values to be passed. For example: trivy image --output=json=out.json --output=table=- --output=cyclonedx=sbom.cdx

Workaround

If we want to log and show the same output format (for example, a table shown to stdout as well as recorded in a txt file), then we can use tee.

If we have different desired output formats, then there are a few workarounds:

Run the scan multiple times. Trivy is usually pretty fast, and if the image already exists, it's not too much work to scan the file contents twice.
Run with an intermediate SBOM format: we can use Trivy to generate an SBOM, then immediately "scan" the SBOM for the desired output format (e.g. generate a cyclonedx file, then scan the SBOM and output a table). However, this approach only works for vulnerability scanning, since the SBOM format is not meaningful for secret checks or config checks.

github-actions[bot] commented 1 year ago

This issue is stale because it has been labeled with inactivity.

jawnsy commented 1 year ago

It'd still be nice to have this feature, judging by the 👍 this is something that lots of people would find useful

romainsuire commented 1 year ago

We also need this feature

cnaslain commented 1 year ago

I would love to have this too. JSON + HTML or TXT.

manzsolutions-lpr commented 1 year ago

We actually run Trivy five times for that reason:

gitlab.tpl
html.tpl
junit.tpl
Full table on stdout
--exit-code 1 --ignore-unfixed --severity CRITICAL to cause the CI job to fail on critical vulnerabilities.

So while with most projects the duration is still reasonable a slow scan literally multiples itself.

Besides multiple outputs during a single run somehow caching the results for re-runs would also work but has other caveats obviously.

//edit: Hmm, bad research on my end: Apparently there is a cache but somehow it's not working for us yet:

https://aquasecurity.github.io/trivy/v0.37/docs/vulnerability/examples/cache/#cache-directory

https://github.com/aquasecurity/trivy/issues/2750

Z4ck404 commented 1 year ago

+1 Having this is super useful.

Mo0rBy commented 1 year ago

+1 My team would find this feature super useful instead of needing to run Trivy multiple times

itaysk commented 1 year ago

Trivy is using a robust cache so running the same scan multiple time essentially doesn't perform a rescan, just reformats the output. Given this information, do you still think multiple outputs are necessary or it's reasonable to run trivy again to get another output (will not rescan).

exiett commented 1 year ago

Trivy is using a robust cache so running the same scan multiple time essentially doesn't perform a rescan, just reformats the output. Given this information, do you still think multiple outputs are necessary or it's reasonable to run trivy again to get another output (will not rescan).

IMHO I think is better to have a support for multi-outputs in a single run because this makes it easier to maintain the command that is being run in the pipeline for when it comes to flags being deprecated (as the --scanners flag recently did with the --security-checks flag).

knqyf263 commented 1 year ago

A lot of people seem to want it, so we decided to change our minds and support this feature. What if adding a new flag --outputs?

trivy image --outputs json=out.json --outputs table=- --outputs cyclonedx=sbom.cdx

The existing flags --format and --output cannot be used with --outputs.

We need to think about templates as it also needs template strings or files. We'd love to hear your thoughts. Thanks.

exiett commented 1 year ago

On our end, it would greatly improve the experience to have a stdout as a given format (we use table so the developer can easy spot the packages and their respective fixes) and also generate a JSON file containing the vulnerabilities that were found, so we can create security tickets for the developers to fix their repositories. The --outputs would work great.

knqyf263 commented 1 year ago

ChatGPT suggested this UI, and it looks good.

$ trivy image --outputs json:result.json --outputs table --outputs template:@junit.tpl:junit.xml --outputs template:@gitlab.tpl:gitlab.json

The default output is stdout, like the table in the above example. For templates, you can pass the template path in the form of template:/path/to/template_file[:/path/to/output].

itaysk commented 1 year ago

it's close to what we have in tracee, only difference is in the template example which in tracee is gotemplate=/my.tmpl:res.json. I think I like the tracee one better since it's clear the template file is on the side of the format and not the out file, and also it's clear what kind of template this is (gotemplate). just my thoughts if people prefer the suggested version it's also fine.

about the flag name - is there a way we can keep it --output/-o? I think pretty much every tool I know uses this flag name for controlling output format, especially -o json is a muscle memory for many folks. Actually, isn't the proposed --outputs compatible with the current --output?

If the flag value is format:file then there's no risk of conflict since this wasn't supported before.
If the flag value is a file path then it's same behavior as previously with --output.
if the flag value is format then this is the potential issue but it's easy to see if the value is a file or a format.

Yes we will need do some smart detection of the flag value, but as far as I understand the proposal we will need to do it anyway in the new --outputs flag.

knqyf263 commented 1 year ago

After thinking for a while, I'm leaning towards the Buildkit approach. Something like the following:

$ trivy image --outputs format=json,out_file=result.json --outputs format=template,template=@junit.tpl,out_file=result.junit

This is because we might have more options for each output. For example, we might add support for template URLs.

$ trivy image --outputs format=template,template_url=https://example.com/trivy/templates/my_custom_report.tpl,out_file=result.txt

Also, I have a plan to generate SBOM and VEX referencing the SBOM.

$ trivy image --vex-template /path/to/vex.template --outputs format=spdx-json,out_file=trivy.spdx.json,vex_format=openvex,out_vex_file=trivy.openvex --outputs format=cyclonedx,out_fle=trivy.sbom.cdx,vex_format=cyclonedx,out_vex_file=trivy.vex.cdx

SBOM and VEX formats can be specified independently:

SBOM: SPDX JSON, VEX: OpenVEX
SBOM: CycloneDX, VEX: CycloneDX
SBOM: CycloneDX, VEX: OpenVEX

It is hard to represent these structured options with --outputs json:result.json.

about the flag name - is there a way we can keep it --output/-o? I think pretty much every tool I know uses this flag name for controlling output format, especially -o json is a muscle memory for many folks.

I'm also sure many Linux tools use the --output <file> style, such as curl, sort, base64, git, etc. For example, I've been using the following flag in cURL millions of times more than kubectl and aws.

$ curl -h
 -o, --output <file>        Write to file instead of stdout

I want to keep the current behavior of --output so it will not add a breaking change.

Yes we will need do some smart detection of the flag value

I thought a new flag was more intuitive for users, but we can use the existing --output for the structured options.

If --output doesn't contain =, consider the value as a file path.

$ trivy image --output result.json --format json IMAGE_NAME

If --output contains =, consider the value as structured options.

$ trivy image --output format=json,out_file=result.json --output format=template,template=@junit.tpl,out_file=result.junit

--format will be ignored, and Trivy will show a warning message.

The downside of the detection is file paths can include =, and it leads to false detection. We can probably do that smarter, though.

itaysk commented 1 year ago

Makes sense, I think the two suggestions are closer than it seems, except the colon divider. I think we should plan this with plugins in mind. maybe plugins can be designated for formatting or outputting and then used seamlessly with trivy. I'll think of a suggestion that considers all that and post here

itaysk commented 1 year ago

@knqyf263 I'm summarizing your suggestion and tweaking it a bit to address my wishlist, let me know what you think:

Requirements

A single "outputs" flag should contain all the information for one output scenario.
A typical output scenario includes:
1. Format - how to serialize the results
2. Destination - where to write the results to
Specific formats, and destinations can have their own configuration.
TBD, Formats and destinations can be builtin (i.e json,stdout) or plugins (i.e html,aws-securityhub)

Usage

General form: --outputs format=myformat[,myformat_setting=value...],dest=mydestination[,mydestination_setting=value]
At the very least, outputs define format= and dest=
Specific configurations are depending on the format and dest used. For example if dest=file is specified, then file_path= is mandatory. But if format=table is specified, then table_width= is optional.
Specific configurations are conventionally prefixed with the file/dest they refer to.
by default, format=table,dest=stdout is selected, so omitting either is fine
a special shorthand is available, if the content of --outputs is a string with no =, then it is interpreted as the value to format= or file_path=, depending if it contains a backslash / or dot . character. For example --outputs json is same as --outputs format=json, and --outputs /path/to/file is same as --outputs dest=file, file_path=/path/to/file.
TBD, given the previous is --outputs compatible with the current --output ?

Plugins

We need to discuss plugins in a separate issue, but since this proposal takes into account the future design of plugins, I'd address the relevant assumptions I'm making:

Trivy will support "format" and "destination" plugins.
User register plugin with Trivy before running scan. Possibly using the existing trivy plugin install mechanism.
User can utilize plugin for formatting or destination just like a builtin. For example, --outputs dest=webhook, webhook_url=http://myendpoint, or --outputs format=html, html_usejavascript=true

Builtin formats

table (default)
1. table_colors (true/false)
json
sarif (?)
spdx-json
cyclonedx
gotemplate
1. gotemplate_file (/path/to/file.tmpl)

Builtin destinations

stdout (default)
file
1. file_path (/path/to/file)

knqyf263 commented 1 year ago

It basically looks good. There are some things to discuss.

Usage

Specific configurations are depending on the format and dest used. For example if dest=file is specified, then file_path= is mandatory. But if format=table is specified, then table_width= is optional.

dest is enough, no? --outputs dest=foo.json (file) or --outputs dest=- (stdout) can describe all the destinations.

a special shorthand is available, if the content of --outputs is a string with no =, then it is interpreted as the value to format= or file_path=, depending if it contains a backslash / or dot . character. For example --outputs json is same as --outputs format=json, and --outputs /path/to/file is same as --outputs dest=file, file_path=/path/to/file.

It doesn't seem to be very easy. I want to keep the current behavior of --format and --output, so it covers this shorthand. It means there is no special shorthand.

At the very least, outputs define format= and dest=

The above rule must be satisfied.

Plugins

User can utilize plugin for formatting or destination just like a builtin. For example, --outputs dest=webhook, webhook_url=http://myendpoint, or --outputs format=html, html_usejavascript=true

Is there any advantage to distinguishing between formatting and destination? What if using --plugin= like --outputs plugin=webhook,plugin.webhook_url=http://myendpoint --outputs plugin=html, plugin.html_usejavascript=true? The plugin will be executed with the JSON result passed through stdout, and plugin.xxx will be passed to the plugin.

$ trivy image debian:11 --outputs plugin=webhook,plugin.webhook_url=http://myendpoint,plugin.use_ssl

is the same as

$ trivy image debian:11 -f json | trivy-plugin-webhook --webhook-url=http://myendpoint --use_ssl

Applying to formatting plugins as well.

$ trivy image debian:11 --outputs plugin=csv,plugin.delimiter=;

would be

$ trivy image debian:11 -f json | trivy-plugin-csv --delimiter=;

We would expect the plugin also works standalone.

$ trivy image debian:11 -f json -o debian11.json
$ trivy csv --delimiter=; ./debian11.json

Several plugins are accepted.

$ trivy image debian:11 --outputs plugin=webhook,plugin.webhook_url=http://myendpoint,plugin.use_ssl --outputs plugin=csv,plugin.delimiter=;

Template URLs

Also, we have to think about remote templates. https://github.com/aquasecurity/trivy/issues/4079

Or we deny this suggestion and ask people to create plugins rather than templates?

knqyf263 commented 1 year ago

After starting the implementation, I realized that in spf13/cobra and spf13/viper, repeated flags are comma-separated and concatenated.

$ trivy image --outputs format=table,dest=table.txt --outputs format=json,dest=foo.json debian:11

It is treated as format=table,dest=table.txt,format=json,dest=foo.json. It is a bit difficult to determine which output group a key/value pair belongs to.

$ trivy image --outputs format=table --outputs dest=foo.txt,format=json debian:11

In the above case, the outputs would be format=table,dest=foo.txt,format=json. Is this destination for table or json?

I have some ideas.

The output must start with format=.
- It must be --outputs format=table,dest=foo.txt --outputs dest=foo.txt,format=table is not allowed.
Use {}
- AWS CLI uses this syntax.
- e.g. $ trivy image --outputs {format=table,dest=table.txt} --outputs {format=json,dest=foo.json} debian:11
Use [] or ()
- gcloud uses this syntax.
- e.g. $ trivy image --outputs table(dest=table.txt) --outputs plugin(name=csv,dest=foo.json) debian:11
Use a different separator such as ; and &
- e.g. $ trivy image --outputs format=table;dest=table.txt --outputs format=json;dest=foo.json debian:11
Escape double quotes
- e.g. $ trivy image --outputs \"format=table,dest=table.txt\" --outputs \"format=json,dest=foo.json\" debian:11

I'm not sure if 2 and 3 work, as the comma might be separated inside the brackets. Any idea is welcome.

UPDATE: 2 and 3 didn't work. Viper reads values as csv.

knqyf263 commented 1 year ago

I found k8s defined a custom flag. We can probably do the same thing. https://github.com/kubernetes/component-base/blob/18782b4b48a05c81a098f3be4c6665bdab5c851d/cli/flag/string_slice_flag.go#L27-L32

itaysk commented 1 year ago

dest is enough, no? --outputs dest=foo.json (file) or --outputs dest=- (stdout) can describe all the destinations.

I had in mind more destinations other than file that some users asked for and can be plugin. Examples: defectdojo, sonarqube, webhook and even aqua (future integration). Also, dest=file is default, so I think you can still configure file output with a single setting: --outputs file_path=/path/to/file.

Is there any advantage to distinguishing between formatting and destination?

I thought it's the same motivation as having separate --format and --output flags. If a single plugin does both, there might be alot of redundant work. For example json serialization can be implemented once (--outputs format=json) and send to different destinations (--outputs dest=file / --outputs dest=webhook / --outputs dest=aqua). Another example is SBOM formats, --outputs format=spdx-json, dest=webhook no need to reimplement spdx in the plugin.

itaysk commented 1 year ago

after discussing this in length offline, we have realized that we were conflating different solutions. Providing multiple outputs in Trivy is quite complicated, because a single run of Trivy can produce different kinds of outputs depending on the scanners involved. But the underlying use case of running a scan once, and repurposing the results for different use cases is something we can improve. This has been discussed in the past in Multiple report options · Issue #720 · aquasecurity/trivy · GitHub and also implemented in feat(conversion): from a json report generate other repo by utix · Pull Request #3014 · aquasecurity/trivy · GitHub. We will follow up on that proposal and add it to Trivy. This should answer the problem without the complexity of multiple outputs, hence I will be closing this issue. Unrelated, we will discuss plugins as outputs in plugin as output option · aquasecurity/trivy · Discussion #4451 · GitHub

reitzig commented 9 months ago

For reference -- all those issue links sent me down some loopy loops! -- the answer as of today is: create Trivy JSON format, then use trivy convert.

aquasecurity / trivy