aboutcode-org / scancode-toolkit

:mag: ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!
https://aboutcode.org/scancode/
2.13k stars 550 forks source link

Scancode process getting killed while scanning with package and license parameter #1950

Open ChinnuJose opened 4 years ago

ChinnuJose commented 4 years ago

@pombredanne

Description

Scancode process getting killed while scanning with package and license parameter

Similar issue was reported earlier https://github.com/nexB/scancode-toolkit/issues/1520 The issue got solved when we upgraded the scancode from 2.2.1 to 3.0.2.

How To Reproduce

When we were trying to scan a package with -c -p -l parameters the scan process is getting killed. Scan works successfully with -c parameters rest everything fails.

System configuration

System : RHEL7 Scancode Version: 3.0.2 Scancode installation method : ZIP file download

ChinnuJose commented 4 years ago

@pombredanne Could you please comment on this

pombredanne commented 4 years ago

@ChinnuJose thank you for the report and sorry for the late reply! Is this something that still happens with 3.02? How much RAM do you have on this machine or VM?

ummer9987 commented 4 years ago

@pombredanne This was happening with version 2.2.1 then we upgraded it then the issue was gone. Now we are facing the same issue in 3.0.2. The issue happening with the scancode server integrated with our application production server, which will have more scancode triggering that sandbox. Is there any limit for the packages to be scanned? RAM :32GB RHEL : Red Hat Enterprise Linux Server release 7.3 (Maipo)

ummer9987 commented 4 years ago

@pombredanne Any updates?

ummer9987 commented 4 years ago

@pombredanne Is there any limit for using the scancode on packages. We are still facing the issue. After some package upload scancode would not work. please find the error screen attached. image

Pratikrocks commented 4 years ago

@ummer9987 you can try with the latest version 3.1.2(from develop branch ) , I think this problem, should not arise with this version

Pratikrocks commented 4 years ago

@ummer9987 I think this error may also arise due to the more number of processes you are running in parallel (10), you can also reduce it or neglect it and try again

ummer9987 commented 4 years ago

@Pratikrocks We are getting the same error when we reduce the number of process and removing the option image

Pratikrocks commented 4 years ago

@ummer9987 , okay Try this command and tell the results ./scancode -clip --json-pp - samples

ummer9987 commented 4 years ago

image

Pratikrocks commented 4 years ago

Your system configuration are mucho fine for running scancode , but still I am not getting why this error is ocouring , However you can also try from the latest branch scancode version 3.1.2

knbknb commented 4 years ago

This is perhaps an out-of-memory error. This might happen because your system is short on RAM and/or it has no swapfile.

try this on the command-line, if you are using linux:

grep -i oom /var/log/syslog | tail -100

This filters the syslog for "oom" - Out of Memory signals.

For me it looks like this:

Sep  3 10:14:01 mdis-jet kernel: [1195797.856737] [  25464]  1005 25464     2897       77    65536        0             0 scancode
Sep  3 10:14:01 mdis-jet kernel: [1195797.856738] [  25478]  1005 25478   319206   283847  2572288        0             0 scancode
Sep  3 10:14:01 mdis-jet kernel: [1195797.856740] [  31877]  1005 31877     1156       16    53248        0             0 sh
Sep  3 10:14:01 mdis-jet kernel: [1195797.856741] [  31878]  1005 31878     1156       17    53248        0             0 sh
Sep  3 10:14:01 mdis-jet kernel: [1195797.856742] [  31879]  1005 31879     1156       43    53248        0             0 byobu-status
Sep  3 10:14:01 mdis-jet kernel: [1195797.856743] [  31880]  1005 31880     1156       43    61440        0             0 byobu-status
Sep  3 10:14:01 mdis-jet kernel: [1195797.856744] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice,task=scancode,pid=25478,uid=1005
Sep  3 10:14:01 mdis-jet kernel: [1195797.856754] Out of memory: Killed process 25478 (scancode) total-vm:1276824kB, anon-rss:1135384kB, file-rss:4kB, shmem-rss:0kB, UID:1005 pgtables:2512kB oom_score_adj:0
Sep  3 10:14:01 mdis-jet kernel: [1195797.937530] oom_reaper: reaped process 25478 (scancode), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

indeed indicating that my host was out of memory when running scancode .

george-kuanli-peng commented 3 years ago

I also have the same problem with the latest release of scancode-toolkit (3.2.3) by installing with pip install scancode-toolkit[full].

Running it with scancode -clpeui --json-pp output.json --csv output.csv --html output.html --spdx-rdf output.rdf --spdx-tv output.tv samples results in

Setup plugins...
Killed

The out-of-memory logs:

Nov 19 18:50:31 itri-ifuzz-virtualbox kernel: [14480.495144] systemd-udevd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-1000
Nov 19 18:50:31 itri-ifuzz-virtualbox kernel: [14480.495183]  oom_kill_process.cold+0xb/0x10
Nov 19 18:50:31 itri-ifuzz-virtualbox kernel: [14480.495371] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Nov 19 18:50:31 itri-ifuzz-virtualbox kernel: [14480.495641] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-3.scope,task=scancode,pid=6798,uid=1000
Nov 19 18:50:31 itri-ifuzz-virtualbox kernel: [14480.495650] Out of memory: Killed process 6798 (scancode) total-vm:931028kB, anon-rss:819296kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:1736kB oom_score_adj:0
Nov 19 18:50:31 itri-ifuzz-virtualbox kernel: [14480.538306] oom_reaper: reaped process 6798 (scancode), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Nov 19 18:56:24 itri-ifuzz-virtualbox kernel: [14833.129304] gmain invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Nov 19 18:56:24 itri-ifuzz-virtualbox kernel: [14833.129321]  oom_kill_process.cold+0xb/0x10
Nov 19 18:56:24 itri-ifuzz-virtualbox kernel: [14833.129404] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Nov 19 18:56:24 itri-ifuzz-virtualbox kernel: [14833.129530] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-3.scope,task=scancode,pid=6869,uid=1000
Nov 19 18:56:24 itri-ifuzz-virtualbox kernel: [14833.129539] Out of memory: Killed process 6869 (scancode) total-vm:931004kB, anon-rss:808496kB, file-rss:4kB, shmem-rss:0kB, UID:1000 pgtables:1720kB oom_score_adj:0
Nov 19 18:56:24 itri-ifuzz-virtualbox kernel: [14833.183142] oom_reaper: reaped process 6869 (scancode), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
pombredanne commented 3 years ago

@george-kuanli-peng Your issue seems to be indeed not enough RAM. Each scancode process needs about 1GB of RAM. Typically you would need somewhere around 8GB for comfortable usage. You seem to run a VM: how much RAM do you have there?

george-kuanli-peng commented 3 years ago

Yes, I am running scancode in a VM with 2GB of memory. The same command now works with 4GB of memory space.

P-EB commented 2 years ago

On my Debian Testing workstation with 24G RAM, running scancode on scancode-toolkit/src leads to an OOMkill with only one process.

Using --max-in-memory 100 makes the "Collect file inventory..." last forever (no file actually scanned), so it doesn't seem helpful to manage RAM consumption and still have a working scancode. (edit, it started to scan files, but also OOMkilled)

Is there a way to manage properly scancode's memory footprint?

pombredanne commented 2 years ago

@P-EB sorry for this. Can you provide some details?

P-EB commented 2 years ago

@pombredanne that's no problem! :)

BTW, totally unreleated, but, may I ask if there's a way to build scancode-toolkit for Debian without bundling all these third party modules? Just the Python work you wrote, and use what's needed from the system packages?

I was thinking scancode-toolkit relies only on what's in requirements.txt, but it seems it relies on far more, I'd like to use as much as possible Debian packages instead of bundled software. :)

pombredanne commented 2 years ago

scancode-toolkit/src

this is like running tar on a tar bomb or running an infinite loop and watching it loop forever... :) so that's not a great candidate... there are about 30,000+ license notices and texts in src / ;)

pombredanne commented 2 years ago

@P-EB

BTW, totally unreleated, but, may I ask if there's a way to build scancode-toolkit for Debian without bundling all these third party modules? Just the Python work you wrote, and use what's needed from the system packages?

I was thinking scancode-toolkit relies only on what's in requirements.txt, but it seems it relies on far more, I'd like to use as much as possible Debian packages instead of bundled software. :)

The day we can have all the versions of these Python packages in Debian it will be able to run from system dependencies. Not until then. All the packages in the requirements.txt file are used: direct dependencies are declared in the setup.cfg and the whole deps tree of exact dependent versions is in requirements.txt

@maxyz has started quite a bit of packaging work a while back and there has been some more work done by @aj4ayushjain but this still needs quite some love to be completed.

This is tracked in https://github.com/nexB/scancode-toolkit/issues/1580 FWIW... There are three types of dependencies:

  1. pure python packages
  2. python package with native code
  3. python package with native code bundled from system deps https://github.com/nexB/scancode-plugins/tree/main/builtins

All these can be ported one by one alright and there are no superfluous ones in earnest. The type 3. have been carefully designed so that they can be also made to use system packages optionally and this was made specifically to support an easier port in Debian.

pombredanne commented 2 years ago

@P-EB re Debian port see also https://github.com/nexB/scancode-toolkit/issues/487

P-EB commented 2 years ago

Thanks for the links!

I'll try to see what dependencies are missing in Debian.