aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://aboutcode.org/scancode/
2.13k stars 550 forks source link

python-2.0 license false positive #2377

Open xu1119 opened 3 years ago

xu1119 commented 3 years ago

Description

When trying to scan the project https://github.com/keras-team/keras with latest scancode, It get two false positives. Two files are: https://github.com/keras-team/keras/blob/master/keras/distribute/sidecar_evaluator.py and https://github.com/keras-team/keras/blob/master/keras/utils/generic_utils.py

"License" and "python" are there in consecutive lines, it is detected as python-2.0 license

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Python module for evaluation loop."""

from __future__ import absolute_import
{
          "key": "python",
          "score": 100.0,
          "name": "Python Software Foundation License v2",
          "short_name": "Python License 2.0",
          "category": "Permissive",
          "is_exception": false,
          "owner": "Python Software Foundation (PSF)",
          "homepage_url": "http://docs.python.org/license.html",
          "text_url": "http://spdx.org/licenses/Python-2.0",
          "reference_url": "https://enterprise.dejacode.com/urn/urn:dje:license:python",
          "spdx_license_key": "Python-2.0",
          "spdx_url": "https://spdx.org/licenses/Python-2.0",
          "start_line": 13,
          "end_line": 15,
          "matched_rule": {
            "identifier": "python_13.RULE",
            "license_expression": "python",
            "licenses": [
              "python"
            ],
            "is_license_text": false,
            "is_license_notice": false,
            "is_license_reference": false,
            "is_license_tag": true,
            "matcher": "2-aho",
            "rule_length": 2,
            "matched_length": 2,
            "match_coverage": 100.0,
            "rule_relevance": 100.0
          },
          "matched_text": "# limitations under the License.\n# ==============================================================================\n\"\"\"Python utilities required by Keras.\"\"\""
        }

and

{
          "key": "python",
          "score": 100.0,
          "name": "Python Software Foundation License v2",
          "short_name": "Python License 2.0",
          "category": "Permissive",
          "is_exception": false,
          "owner": "Python Software Foundation (PSF)",
          "homepage_url": "http://docs.python.org/license.html",
          "text_url": "http://spdx.org/licenses/Python-2.0",
          "reference_url": "https://enterprise.dejacode.com/urn/urn:dje:license:python",
          "spdx_license_key": "Python-2.0",
          "spdx_url": "https://spdx.org/licenses/Python-2.0",
          "start_line": 14,
          "end_line": 16,
          "matched_rule": {
            "identifier": "python_13.RULE",
            "license_expression": "python",
            "licenses": [
              "python"
            ],
            "is_license_text": false,
            "is_license_notice": false,
            "is_license_reference": false,
            "is_license_tag": true,
            "matcher": "2-aho",
            "rule_length": 2,
            "matched_length": 2,
            "match_coverage": 100.0,
            "rule_relevance": 100.0
          },
          "matched_text": "# limitations under the License.\n# ==============================================================================\n\"\"\"Python module for evaluation loop.\"\"\""
        }

Text for python_13.RULE is license = "Python"

How To Reproduce

scancode -li --license-text --json-pp - generic_utils.py scancode -li --license-text --json-pp - sidecar_evaluator.py

System configuration

pombredanne commented 3 years ago

@xu1119 Thank you: a great find.

FWIW, I usually run a scan with these options --license --license-text --license-text-diagnostics --json-pp - when I want to validate some bug. So much so that I created an alias in my bash aliases: alias sca='scancode --license --license-text --license-text-diagnostics --json-pp -' This may help you.

That said, you found exactly the issue and the problematic rule. The way to resolve this would be to create a false positive rule this way:

  1. create a src/licensedcode/data/rules/false-positive_not_python.RULE file with this content:
    python module
  2. create the companion src/licensedcode/data/rules/false-positive_not_python.yml data file with this content:
    is_false_positive: yes
    notes: Do not detect license python module as a Python license
    Seen in https://github.com/keras-team/keras/blob/ad9268d67014273e35faac4ff21cbfe929bf1d2b/keras/utils/generic_utils.py
    and reported in https://github.com/nexB/scancode-toolkit/issues/2377

    (the notes are required, but they can be anything useful to qualify and explain the false positive: so here this is just a suggestion)

With these two files added, you can then run a scancode --reindex-licenses to rebuild the license detection index with these two new rules. Then retry your scans: only an Apache license should be reported.

What this does is basically this: If the text python module is detected alone and is not part of a larger detection, then this is a false positive that should be ignored.

Do you feel like you can submit a PR with this? We can handle it if you are not comfortable with doing it otherwise.

xu1119 commented 3 years ago

Another file sidecar_evaluator.py has text:

# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Python utilities required by Keras."""
from __future__ import absolute_import

The text python utilities may also add to false positive rule?

pombredanne commented 3 years ago

The text python utilities may also add to false positive rule?

@xu1119 correct, that would mean adding another rule for this.