Closed secDre4mer closed 1 year ago
In case it's relevant, some background information that led to this PR: We use ~20000 YARA rules in a ruleset and noticed that in such a large ruleset, condition evaluation takes up a serious amount of CPU time (~50-70% of total YARA scan time). This PR tries to reduce the overhead created by conditions (in our case, ~90% of all rules fall into the "only evaluate if a string matches" category introduced by the PR). Initial timings look good, with condition evaluation time dropping by ~65% in our case.
Some additional optimizations (like: tracking which strings actually have to match for the condition to possibly be true, tracking the number of string matches, ...) could be implemented if considered worthwhile.
Optimize a common case where YARA conditions are formed like e.g. "... and 1 of them and ...", in other words, requiring a string match to ever be true. By noting these cases and recording in a bitmap if a string match occurred, the condition evaluation for these rules can be skipped entirely in most cases.