Open DennisClark opened 2 years ago
@DennisClark this is already fixed in the LicenseDetection branch for the upcoming release: https://github.com/nexB/scancode-toolkit/tree/add-license-detection.
Similar to Issue 2
in https://github.com/nexB/scancode-toolkit/issues/3069#issuecomment-1237003830 and also similar to this issue reported by eclipse foundation here: https://github.com/nexB/scancode-toolkit/issues/2878#issuecomment-1128612554, this is solved by:
Here the detection rule is "unknown-intro-followed-by-match"
i.e. an unknown intro was there followed by a proper detection and so this unknown can be removed. This is achieved by tagging specific rules as is_license_intro as True.
New license detection looks like this:
"detected_license_expression": "mit-nagy",
"detected_license_expression_spdx": "LicenseRef-scancode-mit-nagy",
"license_detections": [
{
"license_expression": "mit-nagy",
"detection_rules": [
"unknown-intro-followed-by-match"
],
"matches": [
{
"score": 50.0,
"start_line": 3,
"end_line": 3,
"matched_length": 2,
"match_coverage": 100.0,
"matcher": "2-aho",
"license_expression": "unknown-license-reference",
"rule_identifier": "license-intro_2.RULE",
"referenced_filenames": [],
"is_license_text": false,
"is_license_notice": false,
"is_license_reference": false,
"is_license_tag": false,
"is_license_intro": true,
"rule_length": 2,
"rule_relevance": 50,
"matched_text": "licensed under",
"licenses": [
{
"key": "unknown-license-reference",
"name": "Unknown License file reference",
"short_name": "Unknown License reference",
"category": "Unstated License",
"is_exception": false,
"is_unknown": true,
"owner": "Unspecified",
"homepage_url": null,
"text_url": "",
"reference_url": "https://scancode-licensedb.aboutcode.org/unknown-license-reference",
"scancode_text_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/unknown-license-reference.LICENSE",
"scancode_data_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/unknown-license-reference.yml",
"spdx_license_key": "LicenseRef-scancode-unknown-license-reference",
"spdx_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/unknown-license-reference.LICENSE"
}
]
},
{
"score": 100.0,
"start_line": 3,
"end_line": 5,
"matched_length": 24,
"match_coverage": 100.0,
"matcher": "2-aho",
"license_expression": "mit-nagy",
"rule_identifier": "mit-nagy.LICENSE",
"referenced_filenames": [],
"is_license_text": true,
"is_license_notice": false,
"is_license_reference": false,
"is_license_tag": false,
"is_license_intro": false,
"rule_length": 24,
"rule_relevance": 100,
"matched_text": "Permission to use, copy, modify,\nand/or distribute this code for any purpose with or without fee is\nhereby granted. There is no warranty.\"",
"licenses": [
{
"key": "mit-nagy",
"name": "MIT Szabolcs Nagy Variant",
"short_name": "MIT Nagy Variant",
"category": "Permissive",
"is_exception": false,
"is_unknown": false,
"owner": "Szabolcs Nagy",
"homepage_url": null,
"text_url": "https://git.musl-libc.org/cgit/musl/commit/src/prng/random.c?id=1569f396bb76e9d54f6c4492ed6778e37b87bc70",
"reference_url": "https://scancode-licensedb.aboutcode.org/mit-nagy",
"scancode_text_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/mit-nagy.LICENSE",
"scancode_data_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/mit-nagy.yml",
"spdx_license_key": "LicenseRef-scancode-mit-nagy",
"spdx_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/mit-nagy.LICENSE"
}
]
}
]
}
],
"license_clues": [],
There was also a bug related to how we group matches into LicenseDetection, I have solved this to factor in license intros when doing this grouping.
Here are the scan results for you to look at:
Old scan just this issue: doris-issue-3078.json.txt
New scan just this issue: doris-add-license-detection-issue-3078.json.txt
Old scan entire file: doris-v31.1.1-LICENSE-dist.json.txt
New scan entire file: doris-add-license-detection-LICENSE-dist.json.txt
I scanned doris-1.1.1-rc03 ( available at https://github.com/apache/doris/archive/refs/tags/1.1.1-rc03.tar.gz ) using scancode-toolkit-31.0.2 and although it detected most of the licenses in the rather complex notice (attached) in doris-1.1.1-rc03/be/src/glibc-compatibility/musl/COPYRIGHT it returns both unknown-license-reference and mit-nagy for this chunk of text:
Although the mit-nagy license is returned for this (and the quoted text is an exact match) the result first returns unknown-license-reference for the third line of this paragraph:
licensed under following terms: "Permission to use, copy, modify,
See lines 47951 through 48028 in the attached scan results to see both detection instances.
Summary: It appears that the scan was misled by the single line (problem) but it then found the correct license when it looked at the entire text (good). It would of course be best if nothing were returned for the false-positive match on unknown-license-reference.
COPYRIGHT.zip
doris-1.1.1-rc03-results.json.zip