aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://aboutcode.org/scancode/
2.15k stars 552 forks source link

Wrong license detection in oauthlib #3512

Closed bennati closed 10 months ago

bennati commented 1 year ago

Scancode detects license https://scancode-licensedb.aboutcode.org/cdla-permissive-1.0.html in the file https://github.com/oauthlib/oauthlib/blob/master/oauthlib/__init__.py (line 9)

tested with scancode versions: 31.2.6 and 32.0.6

ran the command: scancode ~/Downloads/__init__.py --copyright --license --info --strip-root --timeout 300 --json-pp ./o.json

contents of o.json:

 "license_detections": [
        {
          "license_expression": "cdla-permissive-1.0 AND bsd-new",
          "matches": [
            {
              "score": 11.43,
              "start_line": 8,
              "end_line": 9,
              "matched_length": 4,
              "match_coverage": 11.43,
              "matcher": "3-seq",
              "license_expression": "cdla-permissive-1.0",
              "rule_identifier": "cdla-permissive-1.0_2.RULE",
              "rule_relevance": 100,
              "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/cdla-permissive-1.0_2.RULE"
            },
            {
              "score": 99.0,
              "start_line": 9,
              "end_line": 9,
              "matched_length": 6,
              "match_coverage": 100.0,
              "matcher": "2-aho",
              "license_expression": "bsd-new",
              "rule_identifier": "bsd-new_143.RULE",
              "rule_relevance": 99,
              "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/bsd-new_143.RULE"
            }
          ],
          "identifier": "cdla_permissive_1_0_and_bsd_new-7cf334b9-947b-5100-0ae0-c0b7fbe7d68f"
        }
      ],
DennisClark commented 1 year ago

Problem reproduced. @AyanSinhaMahapatra please investigate.

Matched texts for cdla-permissive-1.0 AND bsd-new
oauthlib-3.2.2/oauthlib/__init__.py
Detected: cdla-permissive-1.0

    :copyright: (c) 2019 by The OAuthlib Community
    :license: BSD, see LICENSE for details.

Detected: bsd-new

    :license: BSD, see LICENSE for details.

oauthlib-3.2.2.tar.gz_scan.json.zip

Scan results attached.

AyanSinhaMahapatra commented 1 year ago

Thanks @bennati for reporting the bug! Thanks @DennisClark I've pushed some rules to fix this. And a fix for another package license detection bug I found.

This is indeed a misdetection, and should be fixed by adding a required phrase in the detected cdla-permissive-1.0_2.RULE, to make sure we don't misdetect from this rule anymore. We would also add a new bsd rule with the copyright part, just to also detect this better.

See https://github.com/nexB/scancode-toolkit/issues/3300 and https://github.com/nexB/scancode-toolkit/pull/3254 which is WIP (and initial work done on this at https://github.com/nexB/scancode-toolkit/issues/2637 previously), and should eliminate cases like these entirely, based on adding required phrases like this massively for all our 30k+ license rules. These are popping up more, so I'll have to bite the bullet here and go ahead with this sooner than later :sweat_smile: .

Also @bennati we use all the license diagnostics options --license --license-text --license-text-diagnostics --license-diagnostics --license-references when we are looking into the bugs.

AyanSinhaMahapatra commented 10 months ago

fixed, thanks @bennati ! closing!