blacklanternsecurity / bbot

A recursive internet scanner for hackers.
https://www.blacklanternsecurity.com/bbot/
GNU General Public License v3.0
4.02k stars 370 forks source link

Numpy import error on tests #1467

Closed liquidsec closed 2 weeks ago

liquidsec commented 2 weeks ago

All tests are currently failing due to the following error, indicating a dependency was updated which broke this import:

[ERRR] Error in unstructured.handle_event(FILESYSTEM("{'path': '/tmp/.bbot_test/scans/testexcavaterawdata_test_g2ykldx164/filedownload...", module=filedownload, tags={'in-scope', 'filedownload', 'file'})): /home/runner/work/bbot/bbot/bbot/modules/unstructured.py:101:handle_event(): numpy.core.multiarray failed to import
[TRCE] concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/home/runner/work/bbot/bbot/bbot/modules/unstructured.py", line 123, in extract_text
    from unstructured.partition.auto import partition
  File "/home/runner/.cache/pypoetry/virtualenvs/bbot-pd-UZ8Fz-py3.9/lib/python3.9/site-packages/unstructured/partition/auto.py", line 83, in <module>
    from unstructured.partition.pdf import partition_pdf
  File "/home/runner/.cache/pypoetry/virtualenvs/bbot-pd-UZ8Fz-py3.9/lib/python3.9/site-packages/unstructured/partition/pdf.py", line 50, in <module>
    from unstructured.partition.pdf_image.pdf_image_utils import (
  File "/home/runner/.cache/pypoetry/virtualenvs/bbot-pd-UZ8Fz-py3.9/lib/python3.9/site-packages/unstructured/partition/pdf_image/pdf_image_utils.py", line 13, in <module>
    import cv2
ImportError: numpy.core.multiarray failed to import
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/runner/work/bbot/bbot/bbot/scanner/scanner.py", line 1062, in _acatch
    yield
  File "/home/runner/work/bbot/bbot/bbot/modules/base.py", line 629, in _worker
    await handle_event_task
  File "/home/runner/work/bbot/bbot/bbot/modules/unstructured.py", line 101, in handle_event
    content = await self.scan.run_in_executor_mp(extract_text, file_path)
ImportError: numpy.core.multiarray failed to import
TheTechromancer commented 2 weeks ago

unstructured has a lot of dependencies, which makes it unwieldy. However its functionality is really important to BBOT, so we need to find a way to prevent this kind of thing.

As we keep adding BBOT modules, there is more and more of a need for some kind of system that will let us cleanly package these bigger tools.

Docker is one solution, but we should keep an eye out for something more lightweight that doesn't require a running daemon. Something like zipapp, but better?

Ideally, this solution would not rely on the tests of the upstream package maintainer. Instead, it would cache a known-working version of the tool (including all its dependencies), and only upgrade it if all of our tests passed.

Getting a system like this in place will help us package/deploy these things in a reproduceable way across multiple linux distros, and make sure they don't break unexpectedly when an upstream dependency collapses.

liquidsec commented 2 weeks ago

Closing the issue as it seems to have resolved itself. Someone must have fixed their upstream oopsie :zap: