aboutcode-org / deltacode

DeltaCode: compare two codebase scans (from ScanCode) to detect significant changes.
http://www.aboutcode.org/
20 stars 27 forks source link

License detection diffs are incorrect #191

Open Zach-Johnson opened 11 months ago

Zach-Johnson commented 11 months ago

It looks to me like license detection diff detection is currently failing. For example, I've added a package to a testing repository and I see this reflected in the diff

{
      "status": "added",
      "factors": [],
      "score": 100,
      "new": {
        "path": "project/vendor/github.com/AndreasBriese/bbloom/LICENSE",
        "type": "file",
        "name": "LICENSE",
        "size": 1671,
        "sha1": "73e9520e4dfbadc8e525d8f38dff93a62f8623fb",
        "fingerprint": "",
        "original_path": "project/vendor/github.com/AndreasBriese/bbloom/LICENSE",
        "licenses": [],
        "copyrights": []
      },
      "old": null
}

however the licenses field is empty -- I would expect this to contain a reference to the new license I think.

I see that there was a large PR on scancode that probably modified the structure: https://github.com/nexB/scancode-toolkit/pull/2961/files and I'm guessing that broke this. I'm happy to work on a fix for this if this project is still under development and will accept PRs

AyanSinhaMahapatra commented 11 months ago

@Zach-Johnson thanks for reporting and offering to help!

We indeed made a large upgrade on the LicenseDetection side on scancode-toolkit which had a lot of breaking changes, and then missed to update this part of deltacode, which has caused this. We now have a license_detections field for each resource instead of the licenses before, and this is a list of LicenseDetection objects which has now a identifier which can be used to see if the detections have changes or not. As two same detections have the same identifier, the identifier having an UUID created with the match contents within. We also have a scan-level license detections list with these identifiers for all unique license detections.

See also https://scancode-toolkit.readthedocs.io/en/stable/reference/license-detection-reference.html for more info on why we added these changes and https://github.com/nexB/scancode-toolkit/blob/develop/CHANGELOG.rst#license-detection for the CHANGELOG on this.

I'm happy to work on a fix for this if this project is still under development and will accept PRs

That's great! Please ask if you need any help in doing this/have any questions, will be very happy to help you update deltacode to work with latest SCTK!

Zach-Johnson commented 11 months ago

@AyanSinhaMahapatra I've started a draft PR here: https://github.com/nexB/deltacode/pull/192. I'm not clear about a couple things, I'll move the discussion to that PR though.