aboutcode-org / deltacode

DeltaCode: compare two codebase scans (from ScanCode) to detect significant changes.
http://www.aboutcode.org/
20 stars 27 forks source link

Simplify Factors reporting #107

Open mjherzog opened 6 years ago

mjherzog commented 6 years ago

The current approach of listing multiple Factors in one column makes it difficult to sort and filter file-level data in the primary DeltaCode reporting tool - i.e. a spreadsheet. It would be better to reset to three separate fields for:

  1. File "status" Added, Modified, Moved, Removed or Unmodified (current definitions)
  2. Copyright - File contains one or more copyright notices OR none
  3. License - File contains one or more license notices/text OR none

With these three as separate fields, a user can more easily choose the relevant combinations of file information. This does not deal with the actual detection of a changed copyright or license notice. Rather the idea is to show whether each file has copyright or license information. This assumes that we deprecate License Category reporting for now see #106

johnmhoran commented 6 years ago

@majurg Will our new status, has_license and has_copyright attributes and related methods entirely replace the existing methods/scoring for license and copyright changes (i.e., no more use of license info added etc. and no related change to a Delta's score)?

Or might we want to retain that information for use in a future issue -- e.g., a CLI option the user can invoke to include this info in the JSON/CSV output -- storing it for now in some new Delta attributes like license_change and copyright_change?

steven-esser commented 6 years ago

@johnmhoran We can add two additional flags; Feel free to make a ticket for specific that if you wish (no need if you want to include it in you current work)

We will need to think about ways to record the license key (and maybe copyright holders), but for now we can record just the fact we have a license change as a flag.

johnmhoran commented 6 years ago

Excellent. Thanks, @majurg . I'll include it in my current work rather than opening a new ticket. (I'm tackling issues #107, #109 and #110 together under the 107 rubric.)

If I understand your reply, we won't be storing the current strings (license info added and the others), but rather will treat the new attributes license_change and copyright_change as boolean, i.e., no change vs. 1+ changes, much like we're treating the new attributes has_license and has_copyright. Does that accurately capture our approach?

steven-esser commented 6 years ago

Yes, exactly.

johnmhoran commented 6 years ago

Thanks.

macrovve commented 5 years ago

For the file, we could follow the git file status:

And we also could use algorithm the git used to detect the file moved, renamed or copied.