aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://aboutcode.org/scancode/
2.13k stars 551 forks source link

`UPL-1.0 or Apache-2.0` is detected instead of just `Apache-2.0` #2274

Open fviernau opened 4 years ago

fviernau commented 4 years ago

Description

Scannning this line results in upl-1.0 being detected in addition to apache-2.0 via this rule.

How To Reproduce

./scancode in.txt -l --json-pp out.txt

System configuration

pombredanne commented 4 years ago

@fviernau Thanks The text at play is: licenses := Seq("Apache 2.0 License" -> url("http://www.apache.org/licenses/LICENSE-2.0.html")), and if I run scancode --license --license-text --license-text-diagnostics ... we get:

          "matched_rule": {
            "identifier": "upl-1.0_or_apache-2.0_1.RULE",
            "license_expression": "upl-1.0 OR apache-2.0",
            "licenses": [
              "upl-1.0",
              "apache-2.0"
            ],
            "is_license_text": false,
            "is_license_notice": true,
            "is_license_reference": false,
            "is_license_tag": false,
            "matcher": "3-seq",
            "rule_length": 50,
            "matched_length": 10,
            "match_coverage": 20.0,
            "rule_relevance": 100.0
          },
          "matched_text": "licenses := [Seq](\"[Apache] [2].[0] License\" -> [url](\"http://www.apache.org/licenses/LICENSE-2.0."

The way I would go about solving this would be:

  1. edit upl-1.0_or_apache-2.0_1.yml to add a minimum_coverage: 50 as a coverage of 20 (e.g. 10/50 words) is not enough to be good.
  2. create two new rules that only report license_expression: apache-2.0 'is_license_tag: yes`minimum_coverage: 95 and relevance: 100 with these texts:
    • licenses := Seq("Apache 2.0 License" -> url("http://www.apache.org/licenses/LICENSE-2.0.html"
    • "Apache 2.0 License" -> url("http://www.apache.org/licenses/LICENSE-2.0.html"

@AyanSinhaMahapatra what do you think?