aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://aboutcode.org/scancode/
2.14k stars 552 forks source link

Eclipse Public License v2.0 matches to epl-1.0 with epl_no-version.RULE #3961

Open jonna-debricked opened 3 weeks ago

jonna-debricked commented 3 weeks ago

Description

Error in parsing version for Eclipse Public License v2.0, which trigger the wrong rule (rid=epl_no-version.RULE).

The correct rule ( rid=epl-2.0_7.RULE) is triggered for both Eclipse Public License v 2.0and Eclipse Public License 2.0. Eclipse Public License v1.0 trigger the correct rule (rid=epl-1.0_18.RULE)

How To Reproduce

license_cache = cache.get_index()
matches = license_cache.match(
        query_string="Eclipse Public License v2.0", min_score=0
)
---> [LicenseMatch: 'epl-1.0', lines=(1, 1), matcher='2-aho', rid=epl_no-version.RULE, sc=98.0, cov=100.0, len=3, hilen=1, rlen=3, qreg=(0, 2), ireg=(0, 2)]

System configuration

pombredanne commented 3 weeks ago

@jonna-debricked Thanks for the report.

  1. Can you try with a more recent version of ScanCode? there have been over 1300 commits since 31.2.1, with 78,495 changed files with 1,543,686 additions and 439,184 deletions. These are massive changes
  2. Can you provide exact links to the files in question? or attach them here? We need to reproduce the issue. Or did you just use a query string like in your example above?
  3. I reckon from this post that you may be using ScanCode in Debricked which is awesome! Can you tell us more about it? How can we help you there?
AyanSinhaMahapatra commented 2 weeks ago

@jonna-debricked thanks for the report! This is reproducible using just the text pasted in the query string, with latest scancode-toolkit versions:

"detected_license_expression": "epl-1.0",
      "detected_license_expression_spdx": "EPL-1.0",
      "license_detections": [
        {
          "license_expression": "epl-1.0",
          "license_expression_spdx": "EPL-1.0",
          "matches": [
            {
              "license_expression": "epl-1.0",
              "license_expression_spdx": "EPL-1.0",
              "from_file": "test",
              "start_line": 1,
              "end_line": 1,
              "matcher": "2-aho",
              "score": 98.0,
              "matched_length": 3,
              "match_coverage": 100.0,
              "rule_relevance": 98,
              "rule_identifier": "epl_no-version.RULE",
              "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/epl_no-version.RULE",
              "matched_text": "Eclipse Public License v2.0",
              "matched_text_diagnostics": "Eclipse Public License"
            }
          ],
          "identifier": "epl_1_0-f87b11f7-1165-d9fd-5ab3-d065b420b554"
        }
      ],

This should be fixed by the new https://github.com/aboutcode-org/scancode-toolkit/pull/3924 PR where we are creating required phrases out of rules and creating smaller rules out of those required phrases.