VirusTotal / yara

The pattern matching swiss knife
https://virustotal.github.io/yara/
BSD 3-Clause "New" or "Revised" License
8.29k stars 1.45k forks source link

YARA doesn't match on non-PE files if pe.imphash() is compared #1571

Closed secDre4mer closed 3 years ago

secDre4mer commented 3 years ago

Minimal example:

rule Not_specific_binary {
   condition:
      not pe.imphash() == "xxx"
}

This rule should match (and used to match, tested in YARA v4.0.2) on arbitrary files, e.g. itself.

However, since (at least) YARA v4.1.0, it no longer matches these files. It still matches on PE files.

This is especially problematic if it is used as a false positive exclusion, e.g.:

rule My_YARA_Rule {
   strings:
        // Some strings...
   condition:
      all of them and not pe.imphash() == "xxx"
}
plusvic commented 3 years ago

This issue has been discussed multiple times (see #1385) and the current behavior looks like the most reasonable one when considering all the edge cases. The problem here is that pe.imphash() is undefined for non-PE files, so the result for not pe.imphash() == "xxx" is also undefined and therefore it evaluates to false.

I think it should be fine if not pe.imphash() == "xxx" evaluates to false for non-PE files, this condition only makes sense for PE files, and I wouldn't expect anyone trying to match non-PE files with a rule that uses pe.imphash().

In the case of false positive exclusion (all of them and not pe.imphash() == "xxx") the logic works the same, if the file is non-PE the condition is false, which should be fine for a rule targeting PE files, and if the file is PE the rule works as expected.

Could you provide more information about how did you get into this issue? It was a real-life rule that you were writing and didn't work as expected? I would like to know more about how people reason about this.

secDre4mer commented 3 years ago

Thanks for the link regarding the previous discussion about this behaviour.

In our case, it was a real-life rule for a common hack tool that existed for a long time and had the and not pe.imphash() == ... part added at some point to exclude a false positive that we encountered. We have a baseline of expected matches on an internal malware repository to guard against regressions when rules are modified. After upgrading the YARA version used in our tests to 4.1.2, the rule no longer matched on some non-PE-Files (e.g. Files with a corrupt or missing PE header, a version that was embedded into a JPG, ... - real-life samples we encountered at some time), which caused our tests to fail.

We also found that this can happen with other conditions that do not use the PE module, e.g. this: not uint32(uint32(0x3c)) == 0x4550 This is especially insidious because this is not something you'd expect to break the rule. It looks like a simple, negated check for a PE header. However, in the cases where no PE header exists, the inner uint32(...) commonly evaluates to something that exceeds the file size, thus the outer uint32(...) is undefined, and thus the rule does not match.

A partial solution that might fix at least some of these "works as intended, not as expected" issues would be to say that undefined == ... always evaluates to false and undefined != ... always evaluates to true. I think that this is more like "expected" behaviour. It's explicitly only a partial solution since issues with comparison operators can still arise. I agree with the previous discussion that there's no clean cut solution here - any comparison with undefined will likely cause some "this is weird" behaviour at some point.

plusvic commented 3 years ago

At some point I considered making undefined == ... always false and undefined != ... always true (https://github.com/VirusTotal/yara/issues/1385#issuecomment-754568828). But the inconsistencies it introduces in the logic are even worse than the solution, that's why I reverted that change in the end.

In fact YARA has historically behaved like it does with the latest version, the regression you saw was probably between version 4.0 and 4.1. It was in version 4.0 when I tried to "fix" this problem and the solution was even worse. For that reason I prefer being cautious and not making the same mistake again, the current behaviour as some quirks, but at least they are the same quirks as always.

One solution that some people have proposed in the past is introducing an operator that allow to check for undefined values. That's something that I may consider although I'm not really convinced as it puts the burden in the people who writes the rules.

secDre4mer commented 3 years ago

Yes, I see. The status quo remains unsatisfying, but I don't see a better solution.

That's something that I may consider although I'm not really convinced as it puts the burden in the people who writes the rules.

The status quo already puts the burden on the people writing the rules (point in case: this issue) because it requires them to be aware of the interactions with undefined in some cases. Having some way to test for "undefinedness" would at least help with handling undefinedness in the way that the rule writer wants. Right now, I'm seriously considering to rewrite our conditions (e.g. the above) like this: not (not all of them or pe.imphash() == "xxx") and adding a simpler test for undefinedness would at least make this more readable.

plusvic commented 3 years ago

Another option for solving you case is:

all of them and (not pe.is_pe() or pe.imphash() != "xxx")

With the rule the intentions is clear, the file must contain all the strings, but it must be a non-PE file or, if the is a PE, the imphash should not be "xxx"

tlansec commented 3 years ago

I think a reasonable solution for this and others issues around the same issue would be to have an exists keyword, then the condition in the authors rule would be:

all of them and
exists pe.imphash and 
not (
    pe.imphash() == "foo" or 
    pe.imphash() == "bar"
)

This solves for cases which are not imphash based i.e. where other PE attributes may be undefined, but where pe.is_pe() would return True,and allows for easy readability in the rule.

tlansec commented 3 years ago

@metthal pointed out that there is a pull request here that approximately achieves the goals of the suggestion above:

https://github.com/VirusTotal/yara/pull/1529

plusvic commented 3 years ago

1529 has been merged.