VirusTotal / yara

The pattern matching swiss knife
https://virustotal.github.io/yara/
BSD 3-Clause "New" or "Revised" License
7.93k stars 1.42k forks source link

Fast Scan mode not working as intended when using `x of them` conditions #2093

Closed tlansec closed 3 days ago

tlansec commented 5 days ago

Describe the bug When using a set of strings in conjunction with fast-mode, YARA will record the offsets of all the matching strings (contrary to the documnetations suggsted functionality)

To Reproduce Use the following rule to match against YARA itself:

rule myrule
{
strings:
        $s1 = "yara"
        $s2 = "help"
        $s3 = "lol"

condition:
        2 of ($s*)
}

Results in:

λ yara tmp.yar yara64.exe -s --fast-scan
myrule yara64.exe
0x19d54c:$s1: yara
0x19f33c:$s1: yara
0x1a0a5c:$s1: yara
0x1a1fc4:$s1: yara
0x1a28f4:$s1: yara
0x1a2a34:$s1: yara
0x1a2a64:$s1: yara
0x1a2acc:$s1: yara
0x1a3364:$s1: yara
0x1a35e4:$s1: yara
0x1a3bbc:$s1: yara
0x1a6e5c:$s1: yara
0x1c53e4:$s1: yara
0x1c548c:$s1: yara
0x213ae7:$s1: yara
0x213c20:$s1: yara
0x1de8a5:$s2: help
0x1de8b4:$s2: help
0x1f6241:$s2: help
0x2103d2:$s2: help
0x213c57:$s2: help

Expected behavior I would expect it to record the first instance of $s1 and $s2 only, e.g.:


λ yara tmp.yar yara64.exe -s --fast-scan
myrule yara64.exe
0x19d54c:$s1: yara
0x1de8a5:$s2: help

**Screenshots**
N/A

**Please complete the following information:**
 - OS: Windows 10 x64
 - YARA version: v4.5.1

**Additional context**
N/A
plusvic commented 4 days ago

My past self comes to the rescue: https://github.com/VirusTotal/yara/commit/4de3d574bae5973c711095c1c755166c07dec322

As explained in the commit message, there were some false negatives in fast-mode with expressions like:

any of <string_set> in <range> any of <string_set> at <offset>

The quick and dirty solution for the false negatives was finding all instances of the strings in <string_set>, even if not strictly required in some other expressions like x of <string_set>. The underlying problem is that at the point of the code were this decision is made, we don't have information about the kind of expressions we are parsing. Fixing this issue would require non-trivial changes.

tlansec commented 3 days ago

Hey Victor,

Thanks for the explanation. It makes sense to avoid false negatives in this scenario, and if there isn't an easy way to fix it then I suppose we should mark this issue as resolved.

Cheers, Tom