aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://github.com/aboutcode-org/scancode-toolkit/releases/
2.08k stars 539 forks source link

After running for more than 24 hours, errored out with pool.py #2160

Open publicst opened 4 years ago

publicst commented 4 years ago

Description

After running following command over 36 hours, scan code removed temporary files, then it crashed without generating report. F:\WS\FG1.5-venv>scancode -lpceiu -n 4 --csv FG15-Int.csv F:\WS\FG1.5-Int-Installer --license-text

... [#-------------------] 107580 Scanned: Syncfusion.Grid.Grouping.Windows.dll Removing temporary files...done. Traceback (most recent call last): File "C:\Users\XXX\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\pool.py", line 720, in next item = self._items.popleft() IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\XXX\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\pool.py", line 726, in next item = self._items.popleft() IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\XXX\AppData\Local\Programs\Python\Python36-32\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "C:\Users\XXX\AppData\Local\Programs\Python\Python36-32\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "f:\ws\FG1.5-venv\Scripts\scancode.exe__main.py", line 9, in File "f:\ws\fg1.5-venv\lib\site-packages\click\core.py", line 722, in call__ return self.main(*args, kwargs) File "f:\ws\fg1.5-venv\lib\site-packages\scancode\utils.py", line 70, in main standalone_mode=standalone_mode, extra) File "f:\ws\fg1.5-venv\lib\site-packages\click\core.py", line 697, in main rv = self.invoke(ctx) File "f:\ws\fg1.5-venv\lib\site-packages\click\core.py", line 895, in invoke return ctx.invoke(self.callback, ctx.params) File "f:\ws\fg1.5-venv\lib\site-packages\click\core.py", line 535, in invoke return callback(*args, *kwargs) File "f:\ws\fg1.5-venv\lib\site-packages\click\decorators.py", line 17, in new_func return f(get_current_context(), args, kwargs) File "f:\ws\fg1.5-venv\lib\site-packages\scancode\cli.py", line 513, in scancode *args, **kwargs) File "f:\ws\fg1.5-venv\lib\site-packages\scancode\cli.py", line 903, in run_scan quiet=quiet, verbose=verbose, kwargs=kwargs, echo_func=echo_func, File "f:\ws\fg1.5-venv\lib\site-packages\scancode\cli.py", line 1093, in run_scanners with_timing=timing, progress_manager=progress_manager) File "f:\ws\fg1.5-venv\lib\site-packages\scancode\cli.py", line 1187, in scan_codebase location, rid, scan_errors, scan_time, scan_result, scan_timings = next(scans) File "f:\ws\fg1.5-venv\lib\site-packages\click_termui_impl.py", line 259, in next rv = next(self.iter) File "f:\ws\fg1.5-venv\lib\site-packages\scancode\pool.py", line 48, in wrap return func(self, timeout=timeout or 3600) File "C:\Users\XXX\AppData\Local\Programs\Python\Python36-32\lib\multiprocessing\pool.py", line 730, in next raise TimeoutError multiprocessing.context.TimeoutError

How To Reproduce

Running scan code with huge code base with following command F:\WS\FG1.5-venv>scancode -lpceiu -n 4 --csv FG15-Int.csv F:\WS\FG1.5-Int-Installer --license-text

System configuration

For bug reports, it really helps us to know:

pombredanne commented 4 years ago

Thank you and sorry for the late reply. Somehow it may be the same or related to this other bug https://github.com/nexB/scancode-toolkit/issues/2106

publicst commented 3 years ago

It may be related but I am only running with 4-processes instead of 100 like the other person's issue. I am running it under Intel Core i7-5930K CPU @ 3.5 GHz with 32 GB.

Under what kind of system is this program expected to run and complete?

pombredanne commented 3 years ago

@publicst you wrote:

Under what kind of system is this program expected to run and complete?

I am sorry you are getting issues! It routinely runs on machines with a few GB or RAM. It must be a bug with Windows. Can you try the latest version with Python on Windows 64 instead? It could be a 32 vs. 64 bits thing

raratiru commented 2 years ago

What would be a "best practice" for scanning a virtualenv? In my case I initiated a full scan in a low end machine and bumped on this error in Debian Bullseye - Python-3.9.9, Intel I5-2400@3.10GHz, 8GB Ram :

After 12 hours: (Of course scancode, scans itself, would that be an issue?)

$ pwd
/home/user/project

$ cat Pipfile
[[source]]
url = "https://pypi.org/simple"
verify_ssl = true
name = "pypi"

[packages]
django = "*"
argon2-cffi = "*"
psycopg2-binary = "*"
diskcache = "*"
sentry-sdk = "*"
gunicorn = "*"
uvicorn = "*"
yalm = "*"

[dev-packages]
ipdb = "*"
pdbpp = "*"
pytest-django = "*"
pytest-cov = "*"
pytest-factoryboy = "*"
django-debug-toolbar = "*"
pytest-splinter = "*"
pytest-bdd = "*"
pylint = "*"
flake8 = "*"
bandit = "*"
black = "==21.12b0"
safety = "*"
scancode-toolkit = "*"

[requires]
python_version = "3.9"

$ scancode -clpeui -n 2 --json-pp sample.json ./
...
Removing temporary files...done.
Traceback (most recent call last):
  File "/home/user/project/.venv/bin/scancode", line 8, in <module>
    sys.exit(scancode())
  File "/home/user/project/.venv/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/home/user/project/.venv/lib/python3.9/site-packages/commoncode/cliutils.py", line 69, in main
    return click.Command.main(
  File "/home/user/project/.venv/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/home/user/project/.venv/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/user/project/.venv/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/home/user/project/.venv/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/user/project/.venv/lib/python3.9/site-packages/scancode/cli.py", line 451, in scancode
    success, _results = run_scan(
  File "/home/user/project/.venv/lib/python3.9/site-packages/scancode/cli.py", line 887, in run_scan
    scan_success = run_scanners(
  File "/home/user/project/.venv/lib/python3.9/site-packages/scancode/cli.py", line 1125, in run_scanners
    scan_success = scan_codebase(
  File "/home/user/project/.venv/lib/python3.9/site-packages/scancode/cli.py", line 1236, in scan_codebase
    scan_timings) = next(scans)
  File "/home/user/project/.venv/lib/python3.9/site-packages/click/_termui_impl.py", line 116, in __next__
    return next(iter(self))
  File "/home/user/project/.venv/lib/python3.9/site-packages/commoncode/cliutils.py", line 172, in generator
    for rv in self.iter:
  File "/home/user/project/.venv/lib/python3.9/site-packages/scancode/pool.py", line 52, in wrap
    return func(self, timeout=timeout or 3600)
  File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 865, in next
    raise TimeoutError from None
multiprocessing.context.TimeoutError
pombredanne commented 2 years ago

After 12 hours: (Of course scancode, scans itself, would that be an issue?)

yes, this would likely be a significant issue, as there are some many licenses in there that it would take forever and exhaust all resources.

BTW, I do not think it is the same issue as the one experienced here before.

In your case, you could --ignore the paths where ScanCode modules are installed, but there are several top level ones in a typical site-packages.

That could be a short term work around... but in the end there are possibly a few better ways:

  1. install ScanCode separately and elsewhere as an app and not as a wheel, such that it is not living inside your current venv. I am toying with using AppImage which would make it much simpler as it would have its own bundled Python and would be a single file.

  2. have scancode be self-aware, such that it detects code of itself to skip itself (which could be extended to be aware of a few other license-related tools, though most are handled OK by ScanCode these days with the latest develop branch)

Alternatively in the short term you could also run with only --package if what you care for are only top-level package data? Help would be much appreciate in any case!

raratiru commented 2 years ago

@pombredanne

BTW, I do not think it is the same issue as the one experienced here before.

Indeed.

I followed the idea of a separate virtual environment for ScanCode, ran it with -cl --package and got a firm "first-time result" I can work with.

Thank you!