aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://aboutcode.org/scancode/
2.13k stars 551 forks source link

ScanCode scan crashes when used in ORT #1670

Closed kallesoranko closed 3 years ago

kallesoranko commented 5 years ago

Description

I am using ScanCode indirectly from ORT and I have faced two cases where ScanCode scan crashes and those seems to cause an NPE in ORT. Original ORT issue can be found at https://github.com/heremaps/oss-review-toolkit/issues/1406#issuecomment-517217030. Full logs can be found there but I add stacktraces below:

Traceback (most recent call last):
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\Scripts\scancode-script.py", line 11, in <module>
    load_entry_point('scancode-toolkit', 'console_scripts', 'scancode')()
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\site-packages\click\core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "c:\users\z00407wp\scancode-toolkit-2.9.7\src\scancode\utils.py", line 79, in main
    standalone_mode=standalone_mode, **extra)
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\site-packages\click\core.py", line 697, in main
    rv = self.invoke(ctx)
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\site-packages\click\core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\site-packages\click\core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\site-packages\click\decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "c:\users\z00407wp\scancode-toolkit-2.9.7\src\scancode\cli.py", line 717, in scancode
    quiet=quiet, verbose=verbose, kwargs=kwargs,
  File "c:\users\z00407wp\scancode-toolkit-2.9.7\src\scancode\cli.py", line 881, in run_scanners
    with_timing=timing, progress_manager=progress_manager)
  File "c:\users\z00407wp\scancode-toolkit-2.9.7\src\scancode\cli.py", line 1027, in scan_codebase
    pool.terminate()
  File "c:\python27\Lib\multiprocessing\pool.py", line 465, in terminate
    self._terminate()
  File "c:\python27\Lib\multiprocessing\util.py", line 207, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "c:\python27\Lib\multiprocessing\pool.py", line 513, in _terminate_pool
    p.terminate()
  File "c:\python27\Lib\multiprocessing\process.py", line 137, in terminate
    self._popen.terminate()
  File "c:\python27\Lib\multiprocessing\forking.py", line 312, in terminate
    _subprocess.TerminateProcess(int(self._handle), TERMINATE)
WindowsError: [Error 5] Access is denied
ERROR: failed to collect codebase at: u'C:\\Users\\z00407wp\\Desktop\\ORT\\devISF06\\Poct1\\scanner\\downloads\\Maven\\com.android.support.test.espresso\\espresso-core\\3.0.0'
Traceback (most recent call last):
  File "c:\users\z00407wp\scancode-toolkit-2.9.7\src\scancode\cli.py", line 677, in scancode
    max_in_memory=max_in_memory
  File "c:\users\z00407wp\scancode-toolkit-2.9.7\src\scancode\resource.py", line 243, in __init__
    self._populate()
  File "c:\users\z00407wp\scancode-toolkit-2.9.7\src\scancode\resource.py", line 397, in _populate
    for top, dirs, files in os_walk(root.location, topdown=True, onerror=err):
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\os.py", line 296, in walk
    for x in walk(new_path, topdown, onerror, followlinks):
  File "C:\Users\z00407wp\scancode-toolkit-2.9.7\lib\os.py", line 281, in walk
    onerror(err)
  File "c:\users\z00407wp\scancode-toolkit-2.9.7\src\scancode\resource.py", line 355, in err
    ('ERROR: cannot populate codebase: %(_error)r\n' % _error)
TypeError: format requires a mapping

System configuration

pombredanne commented 5 years ago

@kallesoranko Thank you for the report! Let me look into this though Scancode 2.9.7 is a very old version! @sschuberth is this still the version that is used in ORT? this is ancient :grimacing:

sschuberth commented 5 years ago

Yes, it is, @pombredanne... my (ex-)collegues at HERE (I'm at Bosch SI now) have https://github.com/heremaps/oss-review-toolkit/pull/1467 pending but there are internal reasons why this is not yet taken into use.

pombredanne commented 5 years ago

@sschuberth congrats!

To convey a sense of the magnitude of the changes since 2.9.7 when compared to the latest develop, here are the numbers of commits and files changed as of today:

https://github.com/nexB/scancode-toolkit/compare/v2.9.7...develop

pombredanne commented 5 years ago

@kallesoranko

  1. The first case is likely an heisenbug on Windows that is transient: are you able reproduces this consistently?

  2. For the second case, we can try a scan on Windows using these: https://repo.spring.io/libs-milestone/com/android/support/test/espresso/espresso-core/3.0.0/

Can you confirm this the package you had troubles with?

kallesoranko commented 5 years ago

I did run a scan over those components that I wasn't able to scan yesterday individually this morning and all of those were successful, which is really surprising since those failures seemed consistent yesterday. As @pombredanne suggested, it may have been heisenbug, caused by fragile Wndows environment. Anyway, it is nice that everything is working after all. Thank you both @pombredanne and @sschuberth for having time to look at this and @sschuberth congrats for the new position.

On my behalf this ticket and ticket in ort can be closed. Thanks again.

pombredanne commented 5 years ago

@kallesoranko Thank you for retesting... but this is not over! I really want to understand why this is happening ;)

kallesoranko commented 5 years ago

Not sure whether this helps but the ORT analyzer results for packages that fail the scan are:

- package:
    id: "Maven:org.apache.maven:maven-ant-tasks:2.1.3"
    purl: "pkg://Maven/org.apache.maven/maven-ant-tasks@2.1.3"
    declared_licenses:
      - "The Apache Software License, Version 2.0"
    declared_licenses_processed:
       spdx_expression: "Apache-2.0"
    description: "Ant Tasks used for Maven integration"
    homepage_url: "http://maven.apache.org/ant-tasks/"
    binary_artifact:
      url: ""
      hash:
        value: ""
        algorithm: ""
    source_artifact:
      url: "https://internnal.artifactory:443/artifactory/gradle-dev/org/apache/maven/maven-ant-tasks/2.1.3/maven-ant-tasks-2.1.3-sources.jar"
      hash:
        value: "49136cf7eb0e463e080ad2ebb0f65e91b19c8101"
        algorithm: "SHA-1"
    vcs:
      type: "Subversion"
      url: "http://svn.apache.org/repos/asf/maven/ant-tasks/tags/maven-ant-tasks-2.1.3"
      revision: ""
      path: ""
    vcs_processed:
      type: "Subversion"
      url: "http://svn.apache.org/repos/asf/maven/ant-tasks/tags/maven-ant-tasks-2.1.3"
      revision: ""
      path: ""
  curations: []

and for case2:

- package:
    id: "Maven:com.android.support.test.espresso:espresso-core:3.0.0"
    purl: "pkg://Maven/com.android.support.test.espresso/espresso-core@3.0.0"
    declared_licenses:
    - "The Apache Software License, Version 2.0"
    declared_licenses_processed:
      spdx_expression: "Apache-2.0"
      description: "The Android Testing Support Library provides an extensive framework\
        \ for testing Android apps"
      homepage_url: "https://developer.android.com/testing"
      binary_artifact:
        url: "https://internal.artifactory:443/artifactory/gradle-dev/com/android/support/test/espresso/espresso-core/3.0.0/espresso-core-3.0.0.aar"
        hash:
        value: "079501e10e8f7453dc07b924c995ac2a4b047c77"
        algorithm: "SHA-1"
      source_artifact:
        url: "https://internal.artifactory:443/artifactory/gradle-dev/com/android/support/test/espresso/espresso-core/3.0.0/espresso-core-3.0.0-sources.jar"
        hash:
        value: "c63c4329eec928810c1eb6bd2a32f31a58bdc157"
        algorithm: "SHA-1"
      vcs:
        type: ""
        url: ""
        revision: ""
        path: ""
      vcs_processed:
        type: ""
        url: ""
        revision: ""
        path: ""
  curations: []
kallesoranko commented 5 years ago

ORT updated the version of ScanCode they use to 3.0.2, but this same error keeps happening. Original issue at https://github.com/heremaps/oss-review-toolkit/issues/1731 and log from ORT:

Run pre-scans...
 Run pre-scan: ignore...
Scan files for: info, licenses, copyrights with 3 process(es)...
Scanned: C:\Users\z00407wp\Desktop\ORT\NEO\dev02\idm-ux\scanner2\downloads\NPM\unknown\ansi-styles\2.2.1\.editorconfig
Scanned: C:\Users\z00407wp\Desktop\ORT\NEO\dev02\idm-ux\scanner2\downloads\NPM\unknown\ansi-styles\2.2.1\index.js
Scanned: C:\Users\z00407wp\Desktop\ORT\NEO\dev02\idm-ux\scanner2\downloads\NPM\unknown\ansi-styles\2.2.1\license
Scanned: C:\Users\z00407wp\Desktop\ORT\NEO\dev02\idm-ux\scanner2\downloads\NPM\unknown\ansi-styles\2.2.1\package.json
Scanned: C:\Users\z00407wp\Desktop\ORT\NEO\dev02\idm-ux\scanner2\downloads\NPM\unknown\ansi-styles\2.2.1\readme.md
Scanned: C:\Users\z00407wp\Desktop\ORT\NEO\dev02\idm-ux\scanner2\downloads\NPM\unknown\ansi-styles\2.2.1\screenshot.png
Scanned: C:\Users\z00407wp\Desktop\ORT\NEO\dev02\idm-ux\scanner2\downloads\NPM\unknown\ansi-styles\2.2.1\test.js
Scanned: C:\Users\z00407wp\Desktop\ORT\NEO\dev02\idm-ux\scanner2\downloads\NPM\unknown\ansi-styles\2.2.1\.jshintrc
Scanned: C:\Users\z00407wp\Desktop\ORT\NEO\dev02\idm-ux\scanner2\downloads\NPM\unknown\ansi-styles\2.2.1\.travis.yml
Removing temporary files...done.
Traceback (most recent call last):
  File "C:\Users\z00407wp\scancode-toolkit-3.0.2\Scripts\scancode-script.py", line 11, in <module>
    load_entry_point('scancode-toolkit', 'console_scripts', 'scancode')()
  File "C:\Users\z00407wp\scancode-toolkit-3.0.2\lib\site-packages\click\core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "c:\users\z00407wp\scancode-toolkit-3.0.2\src\scancode\utils.py", line 79, in main
    standalone_mode=standalone_mode, **extra)
  File "C:\Users\z00407wp\scancode-toolkit-3.0.2\lib\site-packages\click\core.py", line 697, in main
    rv = self.invoke(ctx)
  File "C:\Users\z00407wp\scancode-toolkit-3.0.2\lib\site-packages\click\core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\z00407wp\scancode-toolkit-3.0.2\lib\site-packages\click\core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "C:\Users\z00407wp\scancode-toolkit-3.0.2\lib\site-packages\click\decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "c:\users\z00407wp\scancode-toolkit-3.0.2\src\scancode\cli.py", line 752, in scancode
    quiet=quiet, verbose=verbose, kwargs=kwargs,
  File "c:\users\z00407wp\scancode-toolkit-3.0.2\src\scancode\cli.py", line 929, in run_scanners
    with_timing=timing, progress_manager=progress_manager)
  File "c:\users\z00407wp\scancode-toolkit-3.0.2\src\scancode\cli.py", line 1075, in scan_codebase
    pool.terminate()
  File "c:\python27\Lib\multiprocessing\pool.py", line 465, in terminate
    self._terminate()
  File "c:\python27\Lib\multiprocessing\util.py", line 207, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "c:\python27\Lib\multiprocessing\pool.py", line 513, in _terminate_pool
    p.terminate()
  File "c:\python27\Lib\multiprocessing\process.py", line 137, in terminate
    self._popen.terminate()
  File "c:\python27\Lib\multiprocessing\forking.py", line 312, in terminate
    _subprocess.TerminateProcess(int(self._handle), TERMINATE)
WindowsError: [Error 5] Access is denied

Exception in thread "main" java.lang.NullPointerException
        at com.here.ort.scanner.scanners.ScanCode.getFileCount(ScanCode.kt:356)
        at com.here.ort.scanner.scanners.ScanCode.generateSummary$scanner(ScanCode.kt:360)
        at com.here.ort.scanner.scanners.ScanCode.scanPath(ScanCode.kt:324)
        at com.here.ort.scanner.LocalScanner.scanPackage(LocalScanner.kt:306)
        at com.here.ort.scanner.LocalScanner.access$scanPackage(LocalScanner.kt:70)
        at com.here.ort.scanner.LocalScanner$scanPackages$2$invokeSuspend$$inlined$map$lambda$1$2.invokeSuspend(LocalScanner.kt:183)
        at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
        at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:238)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
        Suppressed: java.lang.NullPointerException
                ... 15 more

> Task :cli:run FAILED
pombredanne commented 5 years ago

@kallesoranko thank for following up... It could be a byzantine between the JVM multithreading and Python own threading and that on Windows only which is a platform with challenges.

Or more plainly it could be TOCTU problem

It seems to be a well known issue See:

sschuberth commented 5 years ago

Hmm, so if the "fix" is to unconditionally ignore the exception, I'm not so happy about that. Probably a better work-around would be (as also described in the StackOverflow answer) to catch the exception, then sleep, then terminate again. That second call to terminate() should not be catched.

pombredanne commented 5 years ago

@sschuberth good point. Catch, sleep 1/10s, retry, catch, sleep 1/2s, retry, Catch, sleep 1s, retry, Fail Some simple "exponential-like backoff" :) and a failure for sure if we have a failure

pombredanne commented 5 years ago

I pushed a fix that should help ... this is merged in develop.

pombredanne commented 5 years ago

See also #1781

pombredanne commented 3 years ago

This has been long merged in develop and released. Closing now.