fkie-cad / Logprep

log data pre processing, generation and shipping in python
https://logprep.readthedocs.io/en/latest/
GNU Lesser General Public License v2.1
30 stars 8 forks source link

Fix docker image and avoid implicit setuptools runtime dependency #683

Closed P4sca1 closed 1 month ago

P4sca1 commented 1 month ago

https://github.com/fkie-cad/Logprep/pull/682 introduced a regression, where the logprep binary is not available in the final image. This is because it is installed with the system-wide pip, which installs the binary to/opt/bitnami/python/bin/ instead of /opt/venv/bin/. pip was removed from the venv, because it is unneeded at runtime. The logprep --version check, which ensures that logprep is properly installed still passed, because /opt/bitnami/python/bin/ is also in PATH. To avoid such issues in the future, I changed the command to explicitly use /opt/venv/bin/logprep.

To install logprep into the venv, it is required to install pip into the virtual environment and use the pip version from the venv instead of the system-wide installation from bitnami. Therefore the --without-pip flag has been removed. The --upgrade-deps flag was added to ensure a recent non-vulnerable version of pip is installed. We no longer remove pip from the runtime image, because if it is not installed in the venv, a (possibly outdated and vulnerable) version from the bitnami image is used.

Once the proper venv was used, I experienced the following error:

Traceback (most recent call last):
  File "/opt/venv/bin/logprep", line 5, in <module>
    from logprep.run_logprep import cli
  File "/opt/venv/lib/python3.10/site-packages/logprep/run_logprep.py", line 13, in <module>
    from logprep.generator.http.controller import Controller
  File "/opt/venv/lib/python3.10/site-packages/logprep/generator/http/controller.py", line 12, in <module>
    from logprep.factory import Factory
  File "/opt/venv/lib/python3.10/site-packages/logprep/factory.py", line 6, in <module>
    from logprep.configuration import Configuration
  File "/opt/venv/lib/python3.10/site-packages/logprep/configuration.py", line 6, in <module>
    from logprep.registry import Registry
  File "/opt/venv/lib/python3.10/site-packages/logprep/registry.py", line 33, in <module>
    from logprep.processor.grokker.processor import Grokker
  File "/opt/venv/lib/python3.10/site-packages/logprep/processor/grokker/processor.py", line 47, in <module>
    from logprep.processor.grokker.rule import GrokkerRule
  File "/opt/venv/lib/python3.10/site-packages/logprep/processor/grokker/rule.py", line 55, in <module>
    from logprep.util.grok.grok import GROK, ONIGURUMA, Grok
  File "/opt/venv/lib/python3.10/site-packages/logprep/util/grok/grok.py", line 35, in <module>
    import pkg_resources
ModuleNotFoundError: No module named 'pkg_resources'

It turns out Logprep has an implicit runtime dependency on setuptools by importing from pkg_resources. This is bad practice because of issues like the above. The pkg_resources package got deprecated and Python 3.7 ships importlib.resources as an alternative. Because Logprep requires Python version 3.10 or newer, we can safely replace pkg_resources which allows running Logprep without having setuptools installed at runtime.