CERT-Polska / karton

Distributed malware processing framework based on Python, Redis and S3.
https://karton-core.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
392 stars 45 forks source link

Option to set gc parameters in config #161

Closed rakovskij-stanislav closed 2 years ago

rakovskij-stanislav commented 2 years ago

Issue: https://github.com/CERT-Polska/karton/issues/147

Config example:

[minio]
secret_key = minioadmin
access_key = minioadmin
address = localhost:9000
bucket = karton
secure = 0

[redis]
host=localhost
port=6379

[mongo]
host=localhost
port=27017

[task_timeouts]
GC_INTERVAL = 60
TASK_STARTED_TIMEOUT =2592000
TASK_CRASHED_TIMEOUT = 2592000
rakovskij-stanislav commented 2 years ago

@nazywam please help me to conquer lint checker. PyCharm does not find any lint mistakes)

rakovskij-stanislav commented 2 years ago

Made the ability to safe cumulate or multiply values (so syntax like 7*24*3600 will be good), and reviewed usages of uppercased constants in order to redefine them. I'll make the linter things, wait a moment)

rakovskij-stanislav commented 2 years ago

Fm, I cannot solve mypy things - it says about Optional[Config] issues and sympy module (despite the last one is builtin)

psrok1 commented 2 years ago

I don't think it's a good idea to put such heavy dependency as sympy there. karton is meant to be pretty lightweight library.

Evaluation of math expressions in config sounds to be out of scope for this PR, so if we can't do that via some safe, built-in dedicated Python function, let's keep it simple and avoid that kind of stuff.

rakovskij-stanislav commented 2 years ago

I don't think it's a good idea to put such heavy dependency as sympy there. karton is meant to be pretty lightweight library.

Evaluation of math expressions in config sounds to be out of scope for this PR, so if we can't do that via some safe, built-in dedicated Python function, let's keep it simple and avoid that kind of stuff.

Okay, but config parsing is one-time thing that will occur on start of processing, this way it can slow karton-core only on start, but not on processing. I do not want to use eval because it's simply a bad practice (despite I suppose we can trust config file content). 'K. removing sympy. But for user convenience we probably should consider to allow users to use math things.

psrok1 commented 2 years ago

Actually, sympify uses eval as well (https://docs.sympy.org/latest/modules/core.html#sympy.core.sympify.sympify, see Warning).

By "heavy" I mean the count and weight of additional dependencies that may involve dependency conflicts in future and additional work on maintenance, not strictly the time of evaluation. Also using symbolic mathematics library just for math expression evaluation doesn't sound right.

Unfortunately I don't see any simple and safe solutions for that (I agree that eval is a bad idea here), so I think it doesn't improve user experience that much to bother about proper math evaluation here. Cost-benefit factor for that feature is too bad for me.

rakovskij-stanislav commented 2 years ago

Hello! Sorry for delay. Sympy removed

rakovskij-stanislav commented 2 years ago
Run mypy --namespace-packages karton/
karton/system/system.py:43: error: Item "None" of "Optional[Config]" has no attribute "config"
karton/system/system.py:45: error: Value of type "Optional[Config]" is not indexable
karton/system/system.py:48: error: Value of type "Optional[Config]" is not indexable
karton/system/system.py:53: error: Value of type "Optional[Config]" is not indexable
karton/system/system.py:58: error: Value of type "Optional[Config]" is not indexable
Found 5 errors in 1 file (checked 16 source files)

Mypy incorrectable (by me) errors. Seems legit

rakovskij-stanislav commented 2 years ago

@nazywam, @psrok1, Hello, I'd like to proceed this pull request faster)

psrok1 commented 2 years ago

By looking at the code, it's not legit and mypy is right:

    def __init__(
        self,
        config: Optional[Config],
        enable_gc: bool = True,
        enable_router: bool = True,
        gc_interval: int = GC_INTERVAL,
    ) -> None:
        self.gc_interval = gc_interval
        self.task_dispatched_timeout = self.TASK_DISPATCHED_TIMEOUT
        self.task_started_timeout = self.TASK_STARTED_TIMEOUT
        self.task_crashed_timeout = self.TASK_CRASHED_TIMEOUT

        if config.config.has_section("karton-system"):
            self.gc_interval = int(
                config["task_timeouts"].get("GC_INTERVAL", self.gc_interval)
            )
            self.task_dispatched_timeout = int(
                config["task_timeouts"].get(
                    "TASK_DISPATCHED_TIMEOUT", self.task_dispatched_timeout
                )
            )
            self.task_started_timeout = int(
                config["task_timeouts"].get(
                    "TASK_STARTED_TIMEOUT", self.task_started_timeout
                )
            )
            self.task_crashed_timeout = int(
                config["task_timeouts"].get(
                    "TASK_CRASHED_TIMEOUT", self.task_crashed_timeout
                )
            )
        super(SystemService, self).__init__(config=config)
        self.last_gc_trigger = time.time()
        self.enable_gc = enable_gc
        self.enable_router = enable_router

You're using config which is Optional[...] so can be None. And the default is set up by super constructor which is called too late. It's not a serious issue because that constructor is used in one way (by actually providing Config as an argument), but it violates the base class contract.

But no problem, I'll fix that for you 😉

rakovskij-stanislav commented 2 years ago

You're using config which is Optional[...] so can be None.

It is not my code :) But thanks anyway!