VirusTotal / yara

The pattern matching swiss knife
https://virustotal.github.io/yara/
BSD 3-Clause "New" or "Revised" License
8.24k stars 1.44k forks source link

Feature Request: Better support to embed assembly instruction patterns #1696

Open govcert-ch opened 2 years ago

govcert-ch commented 2 years ago

When writing yara rules for (packed) malware, it is often difficult to find unique constants or strings to look for. It usually makes more sense to search for assembly instructions typical for some unpacking code fragments. When you have a few samples, it makes sense to try to extrapolate in what other form these instructions might appear in different compiler runs, like different register selection and different instruction ordering, to increase the sensitivity of the rule and have one rule that matches all your samples, plus other potential samples, allowing for some variations.

In many of the more complex situations, I wished to have better support for writing assembly code patterns, but I'm aware architecture specific stuff should be avoided, and search performance must be kept in mind. So, full disassemblers will probably never be an option. There are still some architechture independant feature extensions that seem to make sense to me. Maybe there are simple solutions for some of them and I'm just not aware of it? So I decided to make a collections of some of these issues - maybe some are easy to solve, maybe some are not worth the effort because nobody else cares, but maybe some of these are also worth to be considered in one way or another.

In the most simple case, we try to match for some instructions where we are confident about the order in different compiler runs. This is a (imaginary) example for such a x64 code fragment with 3 instructions:

rule sample1 {
    strings:
        $1 = { 69 (C0|C9|D2|DB|E4|ED|F6|FF) 01 01 01 01  // imul <r32>, 1010101h - 0x45 prefix could be used for r8-r15
               [0-50]      // first and second instructions are not adjacent, some stuff can be in between
               C1 E8 02    // shr eax, 2
               [0-1]       // let's assume we know the next instruction is adjacent - but we still must allow for the optional 0x44 prefix byte for r8-r15
               81 (C0|C1|C2|C3|C4|C5|C6|C7) 01 01 01 01 } // add <r32>, 1010101h - 0x44 prefix could be used for r8-r15
    condition:
        $1
}

To increase sensitivity, I added byte variants (the (...) stuff) where the registers are encoded in the instructions, so the first one matches no matter which register the multiplication is actually applied to. This allows for some diversity in different compiler runs.

My first, and probably hardest to implement "wish": Embedded register encodings rarely occur exactly at nibble-boundaries, i.e. exactly where hex characters appear - hence the variant listing looks messy. It would be very nice to have a possibility to use bit-slices in hex strings, maybe something like 11xyzxyz - meaning: two 1 bits, then a sequence of three arbitrary bits encoding one register, which repeats (the repetition occurs because this is actually encoded as a 3-operand multiplication, here imul ebx, ebx, 1010101h, where 2 registers are identical). This actually corresponds to the the first instruction above. An even nicer way (I'm inventing some syntax here without much thought) would be 11|$bla=???||$bla|, which would also define a placeholder in a variable $bla that could later be used as a matcher. E.g. let's assume we knew the third instruction above used the same register, we could encode it as 11000|$bla|, which would increase the specificity of the rule. Alternatively, 3 different variables could be used, i.e. something like 11|$b1=???||$b2=???| and later 11000|$b3=???|, followed by a condition in the second part of the rule enforcing $b1==$b2 and $b2==$b3, which in my understanding corresponds more to the way Yara works.

A second, less important "wish" is about the prefix bytes 0x44 and 0x45 above, so bytes sometimes put in front of instructions for extended functionality, like accessing registers r8-r15. Hence I put an optional gap of 1 byte ([0-1]) between instructions 2 and 3. It would be nice and once more increasing the specificity if there were a possibility to express "nothing-or-0x44", instead of just "nothing-or-any-byte" in this case - in other words, something comparable to the ? suffix in regex's. I'm aware regex could actually be used for this, but that would make matching slower and increase complexity for the overall rule. Plus, a lot of other hex string features and readability would be lost.

My most important "wish" though is about the scenario where the chosen instructions can appear in any order, because another compiler run might use a different code fragmentation of the function. But it would still be reasonable to assume all instructions appear within a certain byte range, e.g. within 300 bytes, but anywhere in the code. Each of these patterns could also appear as singular patterns anywhere else. This situation was already discussed for the case of 2 normal strings here: https://github.com/VirusTotal/yara/issues/1454 - and indeed, it is possible to generalize this, but it gets quite convoluted:

rule sample2 {
    strings:
        $1 = { 69 (C0|C9|D2|DB|E4|ED|F6|FF) 01 01 01 01 } 
        $2 = { C1 E8 02 }   
        $3 = { 81 (C0|C1|C2|C3|C4|C5|C6|C7) 01 01 01 01 } 
    condition:
        for any i1 in (1..#1) : (
            for any i2 in (1..#2) : (
                for any i3 in (1..#3) : (
                    math.max(@1[i1], math.max(@2[i2], @3[i3])) - math.min(@1[i1], math.min(@2[i2], @3[i3])) < 300
                )
            )
        )
}

I first assumed this code to be horribly slow, but to my surprise, it isn't: the performance loss is below 5 percent. This is probably because the individual instructions patterns - maybe except for the middle one - are already so specific that there are not too many iterations for the corresponding loops. And as the condition evaluations happens after all strings were matched, it only takes a short time. One drawback of course is the ugliness of the rule - also because the math.max/min operators can only take 2 arguments and not a full list, plus the nested for's speak their own language. But that could probably be solved with a separate preprocessor. The bigger issue here is that Yara currently only allows 4 nesting levels of loops; if you nest deeper, you get a loop nesting limit exceeded error. I guess this limit could be adjusted, but probably not for retrohunts or live notifications in Virustotal (I'm not sure though). Maybe instead of a nesting level, the limit could rather be applied on the total number of loop runs resulting in a multiplication of the variants, but this of course is not known when the rule is compiled, but still could be applied at runtime?

Much nicer of course would be the possibility to have an additional syntax in the "condition:" part like all of them within 300 or so, doing the same - yes, I know, I'm dreaming here. Performance wise much better would be the possibility to include this into the "strings:" part, e.g. with something like this (again with some imaginary syntax):

rule sample3 { // warning, imaginary sytax
    strings:
        $1 = { <a +300>
               <a 69 (C0|C9|D2|DB|E4|ED|F6|FF) 01 01 01 01>
               <a C1 E8 02>    
               <a 81 (C0|C1|C2|C3|C4|C5|C6|C7) 01 01 01 01> } 
    condition:
        $1
}

This defines a range restriction a, with one syntax element defining it's range (300 bytes), and the other ones referencing a and giving the actual substrings. When the string parser triggers for each of the individual substrings, it updates the "last-offset" for it and checks if the "last-offset's" of all other substrings of the same group a lie within the defined range, looking backwards. The situation would be comparable to a regex lookahead. I'm aware this would probably be a tricky thing to implement, and maybe it is fine to live with the nested loop approach (as long as the nesting level is no issue), but maybe this is also worth a thought, as it would probably be more efficient than the nested loops.

jhumble commented 2 years ago

I too would find features like this incredibly helpful, and wanted provide some further examples. I've often found myself writing rules where I'm matching a couple of instructions and I want the dst register in the first instruction to match the src register in the second instruction. It can be done now, but the syntax is very ugly. Another example is locating a subpattern (for dereferencing or similar use cases) when there are variable length wildcards before the section we care about. Currently, I can basically iterate through the match in the condition in order to find where my offset is, but it's not pretty.

Example:

  $Alloc_RWX_global = {  <trimmed beginning>                                                                                                                                                                                                                                                                
                          (
                              FF 35 (?? ?? 4? 00| ?? ?? 0? 10 ) |                         // push *protect; flNewProtect == *0x3000 (checked by condition)
                              A1 (?? ?? 4? 00| ?? ?? 0? 10 ) |                            // mov <reg> [local_addr]; push; -- condition checks *local_addr == 0x3000
                              (8B|8D) (05|0D|15|1D|2D|35|3D) 
                              (?? ?? 4? 00| ?? ?? 0? 10 )                  [0-16]         // mov <reg>, *ptr
                              (
                                  (50|51|52|53|55|56|57) |                                // push <reg>
                                  89 (44|4c|54|5c|6c|74|7c) 24 ??                         // mov [esp+8], reg
                              )
                          )                                                [0-12] <trimmed remainder>

So we're matching a couple different ways of loading 0x3000 from a pointer, and pushing it onto the stack or loading it directly into [ESP+offset]. The string pattern itself does not contain that 0x3000. Instead it's matching instructions that reference an address that may contain 0x1000 or 0x3000 and the condition basically scans through the pattern looking for the A1 (load eax, ptr) or FF 35 (load , ptr). Where that will show up can vary since there are variable length wildcards between the instructions. but once it finds those bytes, the condition knows that the next 4 bytes are the relative offset. I can then use that to do some math and find the actual value stored and ensure it's 0x3000. See:

  // check that second arg is a ptr to 0x3000 or 0x1000
  for any of ($Alloc_RWX_g*):
  (   
      for any j in (1 .. 20): // rule starts with 2-5 byte instr, up to 6 byte wildcard, then mov [*ptr], so max offset is 11
      (   
          (   // mov eax, ptr;
              uint8(@+j) == 0xA1 and 
              // *ptr == 0x3000
              uint32(pe.rva_to_offset(uint32(@ +j + 1) - pe.image_base)) == 0x3000 or
              uint32(pe.rva_to_offset(uint32(@ +j + 1) - pe.image_base)) == 0x1000
          )
      )   or  
    (   
        // push ptr;
        uint8(@+j) == 0xFF and 
        uint8(@ + j + 1) == 0x35 and 
        // *ptr == 0x3000                                                                                                                                                                                                                                                                                                                                 
        uint32(pe.rva_to_offset(uint32(@ + j + 2) - pe.image_base)) == 0x3000 or
        uint32(pe.rva_to_offset(uint32(@ + j + 2) - pe.image_base)) == 0x1000
    ) or <...trimmed>

This works but it's very ugly. It would be much nicer if I could define something like (A1|FF 35)(?P<offset>....) in my pattern, then in the condition (using python named capture group syntax) uint32(pe.rva_to_offset(uint32(@.group("offset")) - pe.image_base)) == 0x3000

Another example of ugly patterns that could be cleaned up with bit masking support:

  (33|31) (C1|C2|C3|C6|C7|
           C8|CA|CB|CD|CE|CF|
           D0|D1|D3|D5|D6|D7|
           D8|D9|DA|DD|DE|DF|
           E8|E9|EA|EB|EE|EF|
           F0|F1|F2|F3|F5|F7|
           F8|F9|FA|FB|FD|FE)             [0-24]      // xor <reg1>, <reg2> -- reg1 != reg2 and neither reg == ESP