clj-holmes / clj-watson

clojure deps SCA
Eclipse Public License 2.0
77 stars 8 forks source link

Consider case where user wants to output formatted results to a file #98

Open lread opened 1 month ago

lread commented 1 month ago

Currently

Findings and other clj-watson messages are written to stdout. (And interesting logging from deps is written to stderr).

This seems fine for unformatted output types stdout and stdout-simple.

But...

Format types json, edn and sarif probably want to go to a file or be piped.

Maybe that's why clj-watson was suppressing all log output before?

Options

Option 1: Do nothing

Maybe we are cold, maybe we don't care.

Option 2: Provide an argument to output findings to a file

A bit of a sad thing is that -o --output is being used for output type. This option is typically used to specify and output file.

Option 3: Only write vulnerability findings to stdout

Write everything else to stderr.

Proposal

I guess option 3 as a minimum (but could also do option 2). But maybe you want to write important settings (see #79) and findings to a file?

I suppose we could include a summary of settings as part of the findings report.

seancorfield commented 1 month ago

I think having the important settings/overrides going to stdout along with the findings is fine for -o stdout and -o stdout-simple -- that's actually what we rely on at work.

For the other output formats, that's more problematic (and has in fact been broken for a long time, since I added a bunch of println output that no one has complained about!).

I don't know the right choice here but option 3 will break our flow at work.

Maybe everything that is println'd today should be log/info'd if the output format is not stdout(-simple) then that wouldn't break my flow and you'd fix the other formats which were broken in v5?

lread commented 1 month ago

So the existing printlns are important for you to redirect (or tee) for your use case. Are they all important? Here's the current println output I see after a local run on my dev box:

Read 119 dependency-check properties.
Merging additional properties:
  # clj-watson.properties file
  nvd.api.key=31a0****-****-****-****-********5002

The idea I snuck in at the end:

I suppose we could include a summary of settings as part of the findings report.

...was hinting that any important printlns could be moved (or also included if we also want to log them) in the report. Maybe add a :settings-used and/or :result-summary.

And.. this new data would be spit out with the stdout and stdout-simple free format reports.

But... current structured edn and json reports are not extensible in this way, they immediately spit out a list/vector of findings without any parent key. So doing so would mean a breaking change.

And maybe this idea does not fit with sarif (which I am not familiar with yet).

seancorfield commented 1 month ago

Right, that's why I just suggested toggling it from println for free format to logging (info) for structured formats. A pragmatic fix for the structured output that doesn't break any existing stdout-based usage. And sets a precedent for any future println output, such as displaying the number of findings etc (which also can't go in the structured formats).

I have no idea what people might be using those edn, json, or sarif outputs for -- although no one has complained about them being broken by me adding that println stuff. Maybe no one uses them?

I think if we introduce an output file option, there would be a good argument for leaving the println stuff as-is in all formats and writing only the findings report to the file.

lread commented 1 month ago

I was thinking that since the println outputs are important to you, they might be important to others, and if they are not moved to the findings report, they could not be easily saved when using structured formats.

But also, yes, I think your idea sounds like a reasonable compromise. And like you say, nobody has complained, so we might be putting effort into an unused feature.

lread commented 2 weeks ago

We don't know if folks are using structure reports (we are guessing no because of the extra printlns), but we can assume that nobody was using JSON because it was not being output as JSON #117.

seancorfield commented 1 week ago

I agree and that strongly suggests no one is currently relying on JSON, EDN, or Sarif output. Which makes me inclined to punt on this until someone asks for it.

But my sense of the right thing to do for this is:

That would provide a clear separation of logging (stderr), the tool's own chatty, informational stuff (stdout), and the actual vulnerability report (file).

Thoughts?

lread commented 1 week ago

The awkwardness of this nags at me a bit, so I am tempted to resolve it, as best we can, now. Lemme try to build on your idea.

Idea Refinement 1

How about this?

  1. If only --output-file is specified and does not include a recognized --output format as its filename extension, fail fast. Otherwise, use the filename extension as the output format written for the output file.
  2. If both --output-file and --output format are specified, output to file in --output format.
  3. If only --ouput-format is specified report is sent to stdout (structured or not, we don't care).
  4. Paste your 3rd point above here.

-f is already taken, so we can't use it as the short form for --output-file.

Idea Refinement 2

One more alternative:

  1. Change --output to also optionally recognize a filename.
  2. If you specify just the format, we get the same behaviour as today, the report goes to stdout.
  3. If you specify a filename, the extension must be a recognized format (else fail fast). The output goes to the file in the specified format.

This doesn't let you output a report to a file with an arbitrary extension, but it also doesn't add an extra cmd line option and doesn't have us scratching our heads as to what its short form might be.

Downsides

  1. There's an awkwardness to unstructured report format names stdout and stdout-simple. I could argue these were never good names. They could be deprecated (no longer documented but recognized) and replaced with txt and txt-brief (or something you prefer, but these seem ok).
  2. A user won't be able to pipe a structured report directly from clj-watson to another tool. I'm fine with that. I expect all the users that are using structured reports (0?) will be too.

Proposal

I like idea refinement 2 with unstructured format renames.

seancorfield commented 1 week ago

Just to check, since --output defaults to stdout, you're saying if --output-file has an extension that isn't .json, .edn, .sarif, or ... .txt, then omitting --output should be an error?

I'd be happy with -O for --output-file as it sort of compliments/overrides -o / --output maybe?

.txt-brief is not a "valid" file extension so I don't think we should use it.

lread commented 1 week ago

My 2nd refinement idea suggested no new --output-file and enhancing --output.

But, happy to have holes poked in this. Is it too magic or non-obvious?

It's a bit tricky to work around this awkwardness we've inherited.

Maybe your idea of -O, --output-file is clearer (with inferring format from extension when o, --output is not specified).

Is .txt-brief problematic in that it would not be recognized in IDEs and other software? What about .brief.txt?

If we go for separate --ouput and --output-file options, since --output is so badly named, should we consider a rename to --format? (While still keeping short form -o because -f is taken, and still recognizing --output for compatibility).

seancorfield commented 1 week ago

Does .sarif make sense as a file extension? SARIF is a specific JSON format (I finally went and looked it up), so the natural file extension would be .json.

Given stdout-simple and sarif are both variant formats of other reports (stdout and json respectively), I don't see how we can have a single output option that covers everything, without some weird contortions.

So, I still prefer -o for the output format and -O / --output-file for the destination file path. We could make the long form --output-format (and continue to support, but deprecate, --output).

lread commented 1 week ago

It seems .sarif is the preferred filename extension: https://docs.oasis-open.org/sarif/sarif/v2.1.0/errata01/os/sarif-v2.1.0-errata01-os-complete.html#_Toc141790671 but .json is an ok 2nd choice.

I think your idea of renaming --output to --output-format is a good compromise.

I also agree that not inferring by filename extension is the safer choice. It is a tad more typing for the user, but the options are not intermingled.

A user might be surprised/annoyed that when they've specified --output-file foo.edn that the default stdout format is used, but we don't have to solve all problems in this one issue.

lread commented 1 week ago

Option 4 - always write all reports to files

Inspired by DependencyCheck (and nvd-clojure, which writes DependencyCheck reports), always write all reports.

So a run would always write:

Control over the output directory would be nice, but report filenames would be fixed.

We could then deprecate --output entirely but would likely introduce --output-dir.

Or... not deprecate --output and leave it as is to avoid breaking workflows for folks.