ahrefs / atd

Static types for JSON APIs
Other
315 stars 52 forks source link

atddiff: provide output without line/column info for more stable results #377

Closed mjambon closed 11 months ago

mjambon commented 11 months ago

Problem

We want to ignore findings that we already reviewed. Our approach is to keep the atddiff output in git. Each new commit on the ATD file produces a report that diffed against the previous version of the report. We want this diff to be meaningful so as to show only the new findings or findings that disappeared.

Here's an example of a diff on the atddiff output:

 Incompatibility in both directions:
 File "semgrep_output_v1.atd", line 1071, characters 21-35
-File "semgrep_output_v1.atd", line 993, characters 21-28:
+File "semgrep_output_v1.atd", line 957, characters 21-28:
 Incompatible kinds of types: option is now a string.
 The following types are affected:
   project_metadata

The - and + lines exist only because some code was inserted and shifted many of the definitions in the file without changing their contents.

Proposed solution

  1. Support a --no-locations option that suppresses the output of the variable parts (e.g. line 1071, characters 21-35).
  2. If possible (and reasonably easy), output a hash identifying each finding based on the structure of the types being compared.

Producing a stable hash for each pair of types being compared may be a lot of work. An approximate solution may be to identify findings only based on the error message (which may be sufficiently unique in practice) e.g. hash the following text:

 Incompatible kinds of types: option is now a string.
 The following types are affected:
   project_metadata

When multiple findings end up having the same error message and the same identifier, we can report the number of such findings. It is only for diff purposes. For investigating the finding, the user would request the full output with locations anyway.