Open kiranravindran90 opened 2 years ago
@kiranravindran90 Thanks: do you have instructions so we can reproduce your setup? including a full trace?
@pombredanne Find the full trace & instruction details.
Scan Instruction : scancode -clpi -n 2 --timeout 3600 --json
Package : (~15GB) Linux based Flashwares & source code.
File types : .bin, .bct, .dat, .dtb, .img, .cpio
Trace :
Removing temporary files...done.
Traceback (most recent call last):
File "C:................\scancode-toolkit-30.1.0_py39-windows\scancode-toolkit-30.1.0\Scripts\scancode-script.py", line 33, in
This is related to: #2160
I am facing the same issue.
OS: Linux (Debian) Python: 3.9 Scancode version: 30.1.0 Installation: download (container Dockerfile is public)
Timeout option set to: --timeout 18000
(5 * 3600 = 18000; 3600 seconds are one hour)
However, the scan still stops after 3600 seconds (60 minutes). I looked at https://github.com/nexB/scancode-toolkit/blob/develop/src/scancode/pool.py#L52. However, I could not really understand how it works.
The first, what is this line actually doing:
func(self, timeout=timeout or 3600)
What exactly does the statement timeout or 3600
does?
Second, what is the purpose of the wrapped
method/function?
@Jeeppler there is a new release of ScanCode and I would like to see if you are also experiencing the issue there. Can you try the latest 31.x? But there is no application archive for Python 3.9 ... only 3.8 for now. Let me review your Dockerfile at https://github.com/mercedes-benz/sechub/blob/develop/sechub-pds-solutions/scancode/docker/ScanCode-Debian.dockerfile
in https://github.com/nexB/scancode-toolkit/blob/6007301cca4eff424202abf197467ee2f004e139/src/scancode/pool.py#L52 The first, what is this line actually doing:
func(self, timeout=timeout or 3600)
What exactly does the statement timeout or 3600 does?
timeout=timeout or 3600
will assign the timeout
argument with the value of the timeout
variable if it has a value or with 3600
seconds (aka. 1 hour) otherwise if timeout
is None or 0. It essentially provides a default timeout if none was provided.
Second, what is the purpose of the wrapped method/function? in https://github.com/nexB/scancode-toolkit/blob/6007301cca4eff424202abf197467ee2f004e139/src/scancode/pool.py#L39
I reckon it feels a bit arcane. This is a function that returns a function such that calling the inner function will always be subject to a timeout. The code is there because of historical multiprocessing bugs that have been existing in the past and a lot of trial and error to get things working correctly (obviously not enough!):
... long story made short: there are likely some changes and fixes in the code of Python that mean we may not need anymore some of this machinery. The difficult thing here is that we support Windows, macOS and Linux and they have subtly different bugs in their Python implementations at times and not-so-subtle differences on how they each handle forking and spawing processes. We support all of them :|
If you think you could help track down these issues, and have a reliably failing test case, I could may be try to simplify the code in a branch and let you test this?
Can you try the latest 31.x? But there is no application archive for Python 3.9 ... only 3.8 for now.
Since Debian 11 comes with Python 3.9 it would be nice to have an archive for Python 3.9 otherwise I would have to install ScanCode via pip
. Are there any problems installing ScanCode via pip
rather than the archive?
If you think you could help track down these issues, and have a reliably failing test case, I could may be try to simplify the code in a branch and let you test this?
I would love to help fix the bug, it includes coding something as well.
However, I was unable so far to find public test cases/files to test it with. The files I use, I am not allowed to share. I tried it with ISO images of Linux Distributions, for example with the one from Rocky Linux. Those images are huge, but the scan takes only a few minutes.
In case you have an idea what type of binaries could work let me know. I am willing to test them.
Related to: #2703
Can I disable the timeout if I set it to 0
? Or does it set the fallback in this case to 3600 seconds?
I was able to reproduce the issue with the Armbian image of the Jetson Nano:
Armbian_22.08.1_Jetson-nano_jammy_current_5.18.19.img
The image takes more than an hour to scan and on Debian 11 with Scancode-Toolkit version: 30.1.0. I get the following output:
/tools/scancode-toolkit-30.1.0/src/cluecode/copyrights.py:3382: FutureWarning: Possible set difference at position 3
remove_tags = re.compile(
Setup plugin: scan:info...
Setup plugin: scan:licenses...
Setup plugin: scan:copyrights...
Setup plugin: scan:packages...
Setup plugin: scan:emails...
Setup plugin: scan:urls...
Setup plugin: output:spdx-tv...
Collect file inventory...
Scan files for: info, licenses, copyrights, packages, emails, urls with 2 process(es)...
Removing temporary files...done.
Traceback (most recent call last):
File "/tools/scancode-toolkit-30.1.0/bin/scancode", line 33, in <module>
sys.exit(load_entry_point('scancode-toolkit', 'console_scripts', 'scancode')())
File "/tools/scancode-toolkit-30.1.0/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
return self.main(*args, **kwargs)
File "/tools/scancode-toolkit-30.1.0/lib/python3.9/site-packages/commoncode/cliutils.py", line 69, in main
return click.Command.main(
File "/tools/scancode-toolkit-30.1.0/lib/python3.9/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/tools/scancode-toolkit-30.1.0/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/tools/scancode-toolkit-30.1.0/lib/python3.9/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/tools/scancode-toolkit-30.1.0/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/tools/scancode-toolkit-30.1.0/src/scancode/cli.py", line 451, in scancode
success, _results = run_scan(
File "/tools/scancode-toolkit-30.1.0/src/scancode/cli.py", line 887, in run_scan
scan_success = run_scanners(
File "/tools/scancode-toolkit-30.1.0/src/scancode/cli.py", line 1125, in run_scanners
scan_success = scan_codebase(
File "/tools/scancode-toolkit-30.1.0/src/scancode/cli.py", line 1236, in scan_codebase
scan_timings) = next(scans)
File "/tools/scancode-toolkit-30.1.0/lib/python3.9/site-packages/click/_termui_impl.py", line 116, in __next__
return next(iter(self))
File "/tools/scancode-toolkit-30.1.0/lib/python3.9/site-packages/click/_termui_impl.py", line 328, in generator
for rv in self.iter:
File "/tools/scancode-toolkit-30.1.0/src/scancode/pool.py", line 52, in wrap
return func(self, timeout=timeout or 3600)
File "/usr/lib/python3.9/multiprocessing/pool.py", line 865, in next
raise TimeoutError from None
multiprocessing.context.TimeoutError
I set --timeout 18000
(5 hours). However, the scan only takes a hour and then I get the multiprocessing.context.TimeoutError
.
@Jeeppler Thanks for digging further. https://redirect.armbian.com/jetson-nano/Jammy_current is 350MB of xzipped data is massive. So I guess that we should rather skipp scanning such an image for anything but file information. We can do it just based on the size, with a default and a possible new CLI option?
What should be the default max size above which it does not make sense to attempt scanning? 100MB? 200MB?
(FWIW, the uncompressed disk image above is 1.7GB ;) )
Can I disable the timeout if I set it to
0
? Or does it set the fallback in this case to 3600 seconds?
you can but that may make all fail. You can make it super short as 0.001 second if you want
So I guess that we should rather skipp scanning such an image for anything but file information
No, no skipping of the file. The requirement is to scan files in that size or bigger for license information. It takes hours, but Scancode can find license information in files of that size.
In case, you have more questions about the use case you can reach me via email using the email address located in this file: https://github.com/mercedes-benz/sechub/blob/develop/MAINTAINERS.md
The problem is, that even if I set the --timeout
above 3600
I get the multiprocessing.context.TimeoutError
after exactly one hour (3600 seconds = 1 hour). My expectation, would be that --timeout 18000
will increase the timeout duration to 5 hours and the scan might work.
I encountered the problem under Linux. It apparently works under Windows.
I ran the test again with the new version of scancode
under Ubuntu 22.04 (Mint 21).
Setup:
$ python3 --version
Python 3.10.6
$ scancode --version
ScanCode version: 31.2.1
ScanCode Output Format version: 2.0.0
SPDX License list version: 3.17
I installed scancode
via pip
.
$ time scancode --timeout 3800 --verbose --strip-root --copyright --license --package --email --url --info --spdx-tv scan.txt Armbian_22.08.1_Jetson-nano_jammy_current_5.18.19.img
Setup plugin: scan:info...
Setup plugin: scan:licenses...
Setup plugin: scan:copyrights...
Setup plugin: scan:packages...
Setup plugin: scan:emails...
Setup plugin: scan:urls...
Setup plugin: output:spdx-tv...
Collect file inventory...
Scan files for: info, licenses, copyrights, packages, emails, urls with 1 process(es)...
Removing temporary files...done.
Traceback (most recent call last):
File "/home/user/.local/bin/scancode", line 8, in <module>
sys.exit(scancode())
File "/home/user/.local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/commoncode/cliutils.py", line 70, in main
return click.Command.main(
File "/home/user/.local/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/user/.local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/user/.local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/user/.local/lib/python3.10/site-packages/scancode/cli.py", line 452, in scancode
success, _results = run_scan(
File "/home/user/.local/lib/python3.10/site-packages/scancode/cli.py", line 905, in run_scan
scan_success = run_scanners(
File "/home/user/.local/lib/python3.10/site-packages/scancode/cli.py", line 1144, in run_scanners
scan_success = scan_codebase(
File "/home/user/.local/lib/python3.10/site-packages/scancode/cli.py", line 1252, in scan_codebase
scan_timings) = next(scans)
File "/home/user/.local/lib/python3.10/site-packages/click/_termui_impl.py", line 116, in __next__
return next(iter(self))
File "/home/user/.local/lib/python3.10/site-packages/click/_termui_impl.py", line 328, in generator
for rv in self.iter:
File "/home/user/.local/lib/python3.10/site-packages/scancode/pool.py", line 52, in wrap
return func(self, timeout=timeout or 3600)
File "/usr/lib/python3.10/multiprocessing/pool.py", line 868, in next
raise TimeoutError from None
multiprocessing.context.TimeoutError
real 60m43,960s
user 4m13,170s
sys 3m25,289s
Scancode was running 60 minutes, before raising the multiprocessing.context.TimeoutError
. However, I specified a timeout of --timeout 3800
, therefore scancode
should have ran for 63 minutes.
@Jeeppler Thanks.... clearly we need to revisit the timeout story! Thank you for all these efforts.
@Jeeppler BTW https://github.com/mercedes-benz/sechub looks awesome! You may find https://github.com/nexb/vulnerablecode and https://github.com/nexB/vulnerablecode/pull/801 of some interest.
@Jeeppler
Since Debian 11 comes with Python 3.9 it would be nice to have an archive for Python 3.9
OK. Let's have a issue to track this. Do you mind to enter one? This should be easy enough now that we automated the whole release build. An alternative could be an appimage of sorts. See also #2836
otherwise I would have to install ScanCode via pip. Are there any problems installing ScanCode via pip rather than the archive?
This should work fine and is tested on each build using the latest version of all deps with https://github.com/nexB/scancode-toolkit/blob/ded56e9120f5fdfb9a1a0309130bb4305a66aacb/azure-pipelines.yml#L194 But I cannot gaurantee that all the versions of the dependencies that otherwise satisfy the constraints work, as we do retest previously released versions with newer deps: that would be practically difficult and resource hungry.
If you want to have the deps "same-as-the-app", then something such as this may be better as it will use the same versions as the app:
pip install --constraint https://raw.githubusercontent.com/nexB/scancode-toolkit/v31.2.1/requirements.txt scancode-toolkit==31.2.1
@pombredanne
Thanks for explaining what the difference between the package archive and the pip
installation is. I did not know about the --constraint
flag, that is interesting.
Furthermore:
I did another test with the Armbian image: Armbian_22.08.1_Jetson-nano_jammy_current_5.18.19.img
. What I observed was that, first the scancode
process has a waiting channel
of 0
, meaning it is running. The CPU usage is very high. One core will be at 100% of utilization. Indicating to me: scancode
does it's work.
After a few minutes (approx. 3-5), all the memory and the entire swap partition are used. See attached screenshots.
Then suddenly, CPU utilization and memory drop. Looking at the processes. The scancode
process waiting channel
will be stuck at pipe_read
. I am not sure what happened with scancode:
Scan command used:
time scancode --timeout 36000 --verbose --strip-root --license --spdx-tv Armbian.txt Armbian_22.08.1_Jetson-nano_jammy_current_5.18.19.img
I see the same pattern with a 64 GB machine.
Is there are way to profile/investigate it?
I was able to confirm: pool.py#L52
is the problem. We overwrote it with a fixed value: https://github.com/mercedes-benz/sechub/blob/develop/sechub-pds-solutions/scancode/docker/Scancode-Debian.dockerfile#L52. Now, it does work for us.
I am not sure what the issue is with Armbian_22.08.1_Jetson-nano_jammy_current_5.18.19.img
and the memory usage. However, Scancode does need a lot of memory to scan a large file.
@Jeeppler Very nicely finding. Thank you. This may be related to https://github.com/nexB/scancode-toolkit/issues/3143 by @nnobelis @nnobelis Do you mind to mind this patch on your big scan? @Jeeppler Do you mind to submit a patch/PR?
I am not sure what the issue is with
Armbian_22.08.1_Jetson-nano_jammy_current_5.18.19.img
and the memory usage. However, Scancode does need a lot of memory to scan a large file.
Each whole file is loaded in memory indeed
@pombredanne I am happy to submit a patch/PR and fix this issue.
However, my current solution is to overwrite pool.py
with brute force. I would have to look at the code in depth to understand it better. I probably will have more question along the way about the Scancode code base.
My expectation is, that the fix allows the user to set the value using the --timeout
parameter. I do not consider hard coding the timeout value to 10 hours as a long term solution.
@pombredanne No sorry I cannot test the changes unless there is a ScanCode release of it (I get ScanCode from PIP when building the Docker image).
@nnobelis we install the Scancode using pip
as well, but later overwrite the pool.py
in the container:
pool.py
: https://github.com/mercedes-benz/sechub/blob/9bec3e053a9ba5498c2ed81799bf6c3f398fb680/sechub-pds-solutions/scancode/docker/pool.pyThanks @Jeeppler. I will have a look when I have capacity.
Getting traceback calls & no report is generated.
How To Reproduce
While scanning source code files, after encountering a .sh file, this error seemed to be popping out (image below).
Many Traceback calls were followed by a "Multiprocessing Context Timeout Error"
System configuration