aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://github.com/aboutcode-org/scancode-toolkit/releases/
2.07k stars 536 forks source link

False positive of ms-azure-data-studio license #3535

Closed mthalman closed 8 months ago

mthalman commented 11 months ago

Description

Getting a false positive report of ms-azure-data-studio license license from this section of text: https://github.com/microsoft/vstest/blob/8b75bc5d019f69733c6c0ef6bdd598f5b557c4cd/src/package/Microsoft.CodeCoverage/ThirdPartyNoticesCodeCoverage.txt#L8-L9

How To Reproduce

Tell us how to reproduce the issue.

Download the contents of https://github.com/microsoft/vstest/blob/8b75bc5d019f69733c6c0ef6bdd598f5b557c4cd/src/package/Microsoft.CodeCoverage/ThirdPartyNoticesCodeCoverage.txt

scancode -l ThirdPartyNoticesCodeCoverage.txt --json-pp report.json

Relevant output:

{
  "path": "src/package/Microsoft.CodeCoverage/ThirdPartyNoticesCodeCoverage.txt",
  "type": "file",
  "detected_license_expression": "ms-azure-data-studio AND mit",
  "detected_license_expression_spdx": "LicenseRef-scancode-ms-azure-data-studio AND MIT",
  "license_detections": [
    {
      "license_expression": "ms-azure-data-studio",
      "matches": [
        {
          "score": 7.07,
          "start_line": 8,
          "end_line": 9,
          "matched_length": 14,
          "match_coverage": 7.07,
          "matcher": "3-seq",
          "license_expression": "ms-azure-data-studio",
          "rule_identifier": "ms-azure-data-studio.LICENSE",
          "rule_relevance": 100,
          "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/ms-azure-data-studio.LICENSE"
        }
      ],
      "identifier": "ms_azure_data_studio-826e28c6-9414-aed0-df09-ef932617f563"
    },
    {
      "license_expression": "mit",
      "matches": [
        {
          "score": 100.0,
          "start_line": 20,
          "end_line": 37,
          "matched_length": 161,
          "match_coverage": 100.0,
          "matcher": "2-aho",
          "license_expression": "mit",
          "rule_identifier": "mit.LICENSE",
          "rule_relevance": 100,
          "rule_url": "https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/licenses/mit.LICENSE"
        }
      ],
      "identifier": "mit-cacd5c0c-204a-85c2-affc-e4c125b2492a"
    }
  ],
  "license_clues": [],
  "percentage_of_license_text": 67.83,
  "scan_errors": []
}

System configuration

Linux, version 32.0.7, pip install

pombredanne commented 11 months ago

Thanks! This one is a mouthful! https://github.com/microsoft/vstest/blob/8b75bc5d019f69733c6c0ef6bdd598f5b557c4cd/src/package/Microsoft.CodeCoverage/ThirdPartyNoticesCodeCoverage.txt#L8-L9

This software incorporates components from the projects listed below. The original copyright notices and the licenses under which Microsoft received such components are set forth below and are provided for informational purposes only. Microsoft reserves all rights not expressly granted herein, whether by implication, estoppel or otherwise.

That said, we could go two ways with a new rule:

  1. treat this text section as a false positive to ignore it.
  2. treat this as a license introduction instead

Since the wording looks like very serious legalese, we may want to keep it around with 2.

pombredanne commented 11 months ago

there seems to be some pattern of text like in https://github.com/microsoft/pxt-common-packages/blob/dbecae95b46eca9014a9b7d4a4c19ffdd02bb0f6/ThirdPartyNotice#L29 and https://github.com/search?q=org%3Amicrosoft+"NOTICES+AND+INFORMATION+BEGIN+HERE"&type=code .... would you know how and where this is generated?

AyanSinhaMahapatra commented 8 months ago

This was fixed, so closing.

vstest-ThirdPartyNoticesCodeCoverage.txt.json

Thanks @mthalman @pombredanne