aboutcode-org / scancode-analyzer

scancode-results-analyzer
4 stars 2 forks source link

Classify closely related and versioned licenses #40

Open AyanSinhaMahapatra opened 3 years ago

AyanSinhaMahapatra commented 3 years ago

@pombredanne at https://github.com/nexB/scancode-toolkit/issues/2399

There is a class of license notice that is problematic which are closely related and versioned licenses

The licenses involved with this class of ambiguous detections are:

mostly the A/L/GPL with and without versions to a lesser extent, other GFDL license

The scan yields these licenses (scan has been edited for brevity):

      "licenses": [
        {
          "key": "gpl-1.0-plus",
          "score": 85.0,
          "start_line": 1,
          "end_line": 1,
          "matched_text": "the gpl"
        },
        {
          "key": "gpl-3.0-plus",
          "score": 4.0,
          "start_line": 2,
          "end_line": 2,
          "matched_text": "therefore [this] is licensed under [the] gpl"
        },
        {
          "key": "gpl-2.0",
          "score": 100.0,
          "start_line": 2,
          "end_line": 2,
          "matched_text": "licensed under the gpl 2."
        }
      ],

This should be a new class of detected license issue:- closely-related-license-notices

ToDo:

  1. Add this class of issue
  2. Add heuristic to classify these correctly