Neo23x0 / panopticon

A YARA Rule Performance Measurement Tool
MIT License
58 stars 11 forks source link

Confusing comment in Dantes_YARA_Inferno #10

Open danielmoder opened 1 year ago

danielmoder commented 1 year ago

On line 25 of bad_rule.yar, the comment // short atom and not fixed with e.g. "$mz at 0" in the condition is confusing, as I was under the impression that the engine would still search the whole file for that string regardless of any fixed location specified in the condition.

Can you clarify why adding the condition $mz at 0 would improve performance?

Neo23x0 commented 1 year ago

$mz at 0 would be transformed into a representation of uint16(0) == 0x5a4d. This wouldn't be the case for, e.g. $mz in (0..100), IIRC.

danielmoder commented 1 year ago

A quick trial shows a significant difference in runtime between $mz at 0 and uint16(0) == 0x5a4d, which suggests this isn't the case (at least for v4.2.3). The data is obviously not representative of normal files, but it highlights the fact that the former is still searching the whole file for "MZ", which is the point I thought the rule was trying to show as the biggest performance hit.

Does this align with your understanding as well? I hadn't heard anything about this sort of post-processing/transformation, but I'm curious to hear more if you remember where you saw it.

import yara
yara.YARA_VERSION
>  '4.2.3'
# Not realistic, just meant to highlight differences
sample = "MZ" * 10000
ruleset_fixed_string = """
rule fixed_string
{
    strings:
        $mz = "MZ"
        $foo = "foo"

    condition:
        $mz at 0 and $foo
}
"""

ruleset_uint16 = """
rule uint16_string
{
    strings:
        $foo = "foo"

    condition:
        uint16(0) == 0x5A4D and $foo
}
"""
rule_fixed_string = yara.compile(source=ruleset_fixed_string)
rule_uint16 = yara.compile(source=ruleset_uint16)
%timeit matches = rule_fixed_string.match(data=sample)
>    76.1 µs ± 1.34 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
%timeit matches = rule_uint16.match(data=sample)
>    25.8 µs ± 920 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Neo23x0 commented 1 year ago

Someone once told me that, but I don't remember who it was.

And the performance impact also depends on the scanned data. And "performance impact" has many shades. There is additional CPU cycles, additional memory usage ...

Using a short atom like "MZ" could have less impact than { 00 00 00 00 00 00 00 }.

In our tests $regex2 had an impressive performance impact.

Neo23x0 commented 1 year ago

BTW: we also found out just recently that malloc() used in libmusl doesn't work well with YARA's PE module and it's use causes a lot of overhead. Using our own malloc() reduced scan duration by 30-30%.

What I mean is that measurements depend on many different input variables.