aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://aboutcode.org/scancode/
2.13k stars 551 forks source link

scancode crash when supplying PNG file to --license-policy #3594

Open armijnhemel opened 12 months ago

armijnhemel commented 12 months ago

Description

Sanity checks for --license-policy are missing leading to a crash.

How To Reproduce

$ ./scancode -l scancode --spdx-tv /tmp/scancode.spdx --license-policy /tmp/tmp8ulw9skr.png 
Setup plugins...
Collect file inventory...
Scan files for: licenses with 1 process(es)...
[####################] 2                  
ERROR: failed to run post-scan plugin: license-policy:
Traceback (most recent call last):
  File "/home/armijn/git/scancode-toolkit/src/scancode/cli.py", line 1084, in run_codebase_plugins
    plugin.process_codebase(codebase, **kwargs)
  File "/home/armijn/git/scancode-toolkit/src/licensedcode/plugin_license_policy.py", line 77, in process_codebase
    if has_policy_duplicates(license_policy):
  File "/home/armijn/git/scancode-toolkit/src/licensedcode/plugin_license_policy.py", line 114, in has_policy_duplicates
    policies = load_license_policy(license_policy_location).get('license_policies', [])
  File "/home/armijn/git/scancode-toolkit/src/licensedcode/plugin_license_policy.py", line 141, in load_license_policy
    conf_content = conf.read()
  File "/usr/lib64/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

WARNING: Files are missing a SHA1 attribute. Incomplete SPDX document created.
Scanning done.
Some files failed to scan properly:
ERROR: failed to run post-scan plugin: license-policy:
Traceback (most recent call last):
  File "/home/armijn/git/scancode-toolkit/src/scancode/cli.py", line 1084, in run_codebase_plugins
    plugin.process_codebase(codebase, **kwargs)
  File "/home/armijn/git/scancode-toolkit/src/licensedcode/plugin_license_policy.py", line 77, in process_codebase
    if has_policy_duplicates(license_policy):
  File "/home/armijn/git/scancode-toolkit/src/licensedcode/plugin_license_policy.py", line 114, in has_policy_duplicates
    policies = load_license_policy(license_policy_location).get('license_policies', [])
  File "/home/armijn/git/scancode-toolkit/src/licensedcode/plugin_license_policy.py", line 141, in load_license_policy
    conf_content = conf.read()
  File "/usr/lib64/python3.10/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
Summary:        licenses with 1 process(es)
Errors count:   1
Scan Speed:     1.83 files/sec. 
Initial counts: 1 resource(s): 1 file(s) and 0 directorie(s) 
Final counts:   1 resource(s): 1 file(s) and 0 directorie(s) 
Timings:
  scan_start: 2023-11-17T150048.155768
  scan_end:   2023-11-17T150051.949987
  setup_scan:licenses: 3.24s
  setup: 3.24s
  scan: 0.55s
  total: 3.85s
Removing temporary files...done.

System configuration

For bug reports, it really helps us to know:

pombredanne commented 11 months ago

Closed in favor of https://github.com/nexB/scancode-toolkit/issues/3596