Add a simplified machine parsable output for `reuse lint`

nogweii commented 6 months ago

A middle ground between the full JSON output and the human-focused plain output. My idea here is integrating reuse lint with tools like reviewdog where it excels with output along the lines of:

scripts/foo1.sh: missing license 'MPL-2.0'
scripts/foo2.sh: no licensing information
scripts/foo3.sh: no copyright information

One can generate such an output using the JSON version and processing it, but I think it would be beneficial to include it upstream.

KlfJoat commented 6 months ago

+1 to this. I'd also suggest allowing us to pick from a few different output formats. And those output formats should be existing output formats.

One of the output formats that's simplified and machine-parseable could be TAP format. Since TAP parsers abound, the direct output of reuse lint could be used in CI/CD pipelines.

Another format, which wouldn't be simplified and human-readable, but is widely used could be JUnit format, which also has a lot of parsers and could be consumed by CI/CD pipelines, including built-in reporting on sites like GitLab.

carmenbianca commented 5 months ago

Thanks for the issue. This seems quite doable to me. Off the top of my head, an implementation would need:

A new function akin to format_plain and format_json in lint.py.
A new argument that can be passed to reuse lint, à la reuse lint --tap or some such.
--json and --tap must be mutually exclusive. argparse already supports something like that.
Some tests.

The TAP format looks decent to me as far as machine-readable specs go. A vague proposal:

1..n
ok - ./CHANGELOG.md
ok - ./README.md
not ok - ./pyproject.toml
  ---
  failed:
    - no-copyright
    - no-license
  ...
# global tests
ok - no missing licenses
ok - no unused licenses
ok - no bad licenses
not ok - no licenses without extension
  ---
  failed:
    - ./LICENSES/MIT
  ...
ok - no read errors

The YAML is a bit verbose, but there's no other way to document why the test failed.

Figuring out the value of n in advance may be a little tricky. I suppose it's number of files (computed in advance) + number of global tests (static).

carmenbianca commented 5 months ago

I also had a look at reviewdog linked in the first comment. The README is very shouty/busy so I didn't make good progress skimming through it. Would the above work for reviewdog?

nicorikken commented 5 months ago

To elaborate on the reviewdog example, here is suggested input for reviewdog in the video from the README:

Details of accepting input formats and an example are described in the Input Format section.

{file}:{line number}:{column number}: {message} is suggested.

The definitions are constructed according to the Vim errorformat.

Another option for different uses might be the SARIF format. This, together with the JUnit.xml were already mentioned in an earlier issue: https://github.com/fsfe/reuse-tool/issues/320

nicorikken commented 5 months ago

One question is how to deal with global errors like missing licenses.

Maybe .: Missing MIT license in LICENSES/ would be an option, so . as the filename.
Maybe LICENSES: Missing MIT license in LICENSES/ to refer to the licenses directory.

KlfJoat commented 5 months ago

IMO, a missing license is a file-specific error. The license referenced by somefile.ext does not exist in LICENSES/. If a line number is needed, it's the line where the license is referenced.

nicorikken commented 4 months ago

Initial attempt in #956 was based on the FileReports, but those only cover the issues of missing_licenses and bad_licenses. Only the ProjectReport covers all issues (missing_licenses, unused_licenses, bad_licenses, deprecated_licenses, licenses_without_extension, files_without_copyright, files_without_licenses, read_errors). I'll work on a new version based on the ProjectReport.

nicorikken commented 4 months ago

I now locally have a working version that assumes the filepath of licenses is in the LICENSES/ directory. That assumption might be a risk for some edge-cases. I'll have to verify.

nicorikken commented 4 months ago

@nogweii @KlfJoat I have a working PR at #956 Can you please take a look to see if it meets your expectations? I'm trying to run the output via reviewdog, but am not successful. Perhaps I'm doing something wrong.

fsfe / reuse-tool

Add a simplified machine parsable output for `reuse lint` #925