fabric8-analytics / fabric8-analytics-worker

fabric8-analytics worker for gathering raw data
GNU General Public License v3.0
8 stars 45 forks source link

[source_licenses] worker chokes on pypi:py-unrar2:0.99.6 #172

Closed msrb closed 7 years ago

msrb commented 7 years ago

pypi:py-unrar2:0.99.6 is a super ugly package. The scan won't finish in 20 minutes on stage/prod. And it seems like the subprocess doesn't even get TimeoutExpired expception.

msrb commented 7 years ago

timeout=2:

$ time python3 -c 'import subprocess; subprocess.check_output(["/opt/scancode-toolkit-2.0.1/scancode", "/tmp/license/pyUnRAR2-0.99.6/"], timeout=2)'
Scanning files for: licenses, copyrights, packages with 1 process(es)...
Building license detection index...Done.
Scanning files...
[####################] 48                                        
Scanning done.
Scan statistics: 48 files scanned in 29s.
Scan options:    licenses, copyrights, packages with 1 process(es).
Scanning speed:  1.71 files per sec.
Scanning time:   28s.
Indexing time:   1s.
Saving results.
Traceback (most recent call last):
  File "/usr/lib64/python3.5/subprocess.py", line 385, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/usr/lib64/python3.5/subprocess.py", line 801, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib64/python3.5/subprocess.py", line 1447, in _communicate
    self._check_timeout(endtime, orig_timeout)
  File "/usr/lib64/python3.5/subprocess.py", line 829, in _check_timeout
    raise TimeoutExpired(self.args, orig_timeout)
subprocess.TimeoutExpired: Command '['/opt/scancode-toolkit-2.0.1/scancode', '/tmp/license/pyUnRAR2-0.99.6/']' timed out after 2 seconds

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib64/python3.5/subprocess.py", line 316, in check_output
    **kwargs).stdout
  File "/usr/lib64/python3.5/subprocess.py", line 390, in run
    stderr=stderr)
subprocess.TimeoutExpired: Command '['/opt/scancode-toolkit-2.0.1/scancode', '/tmp/license/pyUnRAR2-0.99.6/']' timed out after 2 seconds

real    0m29.803s
user    0m0.055s
sys 0m0.008s
msrb commented 7 years ago

The message timeouts after 30 minutes, another worker picks it up and also chokes, and so on... and suddenly the whole system is on hold.

fridex commented 7 years ago

Related: https://github.com/selinon/selinon/issues/31

pombredanne commented 7 years ago

@msrb Let me check that out on the scancode side.

pombredanne commented 7 years ago

There is a ~100MB test text file with zeroes that is the thing that ScanCode chokes on.

pombredanne commented 7 years ago

Actually this file contains 100M times the '0' character.

msrb commented 7 years ago

@pombredanne ah, very good catch. You're right - that's the reason why it takes so long. Thanks :)

pombredanne commented 7 years ago

I am working on a fix it with https://github.com/nexB/scancode-toolkit/issues/712

jpopelka commented 7 years ago

Work-around #181 is in place, closing.