CERT-Polska / karton

Distributed malware processing framework based on Python, Redis and S3.
https://karton-core.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
381 stars 45 forks source link

Fix negated filters logic: non-boolean AND/OR #247

Closed psrok1 closed 5 months ago

psrok1 commented 6 months ago

Long story short

Original filter logic was simple:

But then we started supporting negated filters (https://github.com/CERT-Polska/karton/pull/179) and simply defined negative match as no match. It doesn't work because (as @Antelox found) !linux OR !windows matches windows making that filter a no-op.

Negated filters logic was then fixed by https://github.com/CERT-Polska/karton/pull/223, but @ups1decyber found corner cases where proposed algorithm doesn't work as expected (https://github.com/CERT-Polska/karton/issues/246)

What was changed?

After analyzing the case, I found that we still want to follow AND/OR logic but we need special value for negative match.

[{"type": "sample", "platform": "win32"}, {"type": "different", "platform": "!win32"}

means: "type MUST BE sample AND platform MUST BE win32" OR "type MUST BE different AND platform CAN'T BE win32"

so !linux OR !windows indeed matches windows but it's special case of match: negative match. Mismatched values (0) should still follow AND/OR logic, but matches should have sign that determines if it's positive match or negative match (1 or -1). Negative match overrides the positive one, because positive match is also lack of filter for specific header value.

Finally non-boolean logic is converted to the boolean value: False for mismatch (0) and negative match (-1), True for positive match (1).

In addition, I added test cases that should cover specific corner cases including these found by @ups1decyber

ups1decyber commented 6 months ago

Hi @psrok1, thank you for looking into the issue! To me it looks like the bug is fixed!

nazywam commented 6 months ago

@msm-cert was more invloved in the discussion so I'll leave the review up to him but I'm thinking if the filter matching is that nontrivial we should probably include a paragraph or two in the documentation about it?