VirusTotal / yara

The pattern matching swiss knife
https://virustotal.github.io/yara/
BSD 3-Clause "New" or "Revised" License
8.26k stars 1.44k forks source link

A way to have optional strings in the rule as "0 of" behavior changed in v4.2.3 #1937

Closed psrok1 closed 1 year ago

psrok1 commented 1 year ago

Is your feature request related to a problem? Please describe.

We spotted that commit https://github.com/VirusTotal/yara/commit/7a99e6dd1212e3dc7e170d5b229703e796d7a314 released in v4.2.3 changed 0 of (...) behavior, so it broke some of our Yara rules:

rule [redacted]
{
    strings:
        [redacted]
    condition:
        ( 2 of ($op_*) )
        and 1 of ($str_encoded_*)
        and 0 of ($set_new_cookie, $magic*)  // for callback only
}

0 of was used in our rules to overcome the fact that Yara doesn't allow matching strings that are not referenced in the condition. In our case we wanted to have "optional" strings that don't affect the match result, but are still looked up so we can get their offsets and extract additional information.

Previous meaning of x of (...) was "match if there are at least x occurrences" of the specified string, so 0 of still matched the original semantics well.

I understand that it started to be confusing when none keyword arrived so none of (...) behaved the same like 0 of (...) as noted in https://github.com/VirusTotal/yara/issues/1695

Describe the solution you'd like

It would be nice to have another way to indicate that some strings are intentionally unreferenced in the condition, but should still be matched. Right now we're doing "staged matching" in multiple places, spawning another Yara match with optional strings and any of them condition.

Possible solutions I would like are:

Let me know what do you think about it!

plusvic commented 1 year ago

Can you elaborate more on the concept of "staged matching"? I understand you are interested in those unreferenced strings in the rule, but is not clear to me whether you are interested in them only when the rest of the rule matches, or when it doesn't match, or in both cases.

psrok1 commented 1 year ago

Yes, we're interested in them only when the rest of the rule matches. By "staged matching" I mean:

In some cases we decided that it would be great to get complete information from single rule in one pass, that's why 0 of hack appeared in our ruleset :smile: Especially if these strings are still fast enough for general matching.

msm-code commented 1 year ago

To give a more concrete example: we write "normal" yara rules first, with the goal of hunting for the specific malware family. We don't have use for 0 of in them.

Then we put the yara rules in our malware extraction system (built using https://github.com/CERT-Polska/malduck, our framework used by a few other orgs). The system runs a callback for every string matched by yara rule. And now, sometimes it's useful to add a "non-detecting" string that will only be used in the malware extraction (for example, string that detects encryption function, that we use in our module toextract the encryption key). We don't want to mess with detections at this point, since the rule is already tested, we just want to have another optional string. Sometimes it's possible to do this in other way, sometimes the 0 of hack comes handy. Lack of workaround is not the biggest issue, but we worry a bit about the backward compatiblity of our system (and other users of our project).

This is probably not the only use case we had, but the one that we stumbled upon first.

plusvic commented 1 year ago

Interesting, I never thought that someone would be using 0 of them in real-life rules. That's why 0 of them was made a special case to make it coherent with none of them.

I'm not sure what's the better solution here, I don't dislike the option of going back to the original meaning for 0 of them and treating none of them as a different case. I'm going to gather opinions among other YARA users and see what comes out of it.

wxsBSD commented 1 year ago

I like the idea of marking a string as intentionally unreferenced. Seems like an elegant solution to this problem while keeping the semantics of 0 of them and none of them clear. Also, intentionally unreferenced strings opens up precisely the scenario you are describing of having your callback process a string that happened to match even if you don't need it in the condition. Seems like a powerful thing to have the capability to do.

mgoffin commented 1 year ago

+1 for the string modifier idea!

malvidin commented 1 year ago

I modified my rules to use #optional_string >= 0 instead of 0 of $optional_string after the change.

vthib commented 1 year ago

In my rules where I want to compute some strings but not have them influence the condition, i use the # >= 0 trick: <real condition> and for all of ($optional_*): (# >= 0).

Another point in favor of the string modifier, or against changing back the 0 of them meaning, is providing intent and opening the door for optimizations. To specify a bit more, it can be useful to know whether, in order to find if a rule matches or not (ie if the condition is true or not):

The issue with the 0 of (...) syntax is that it does not indicate that the full matches computations are needed for those strings. So it's a bit of an all or nothing. Either optimize all strings, but you might miss some match callbacks. Or do not optimize any strings, and you might lose a bit of performance. However, with a string modifier, it is possible to distinguish the two sets of strings, and then only optimize the matching strings, and not the "optional" ones

wxsBSD commented 1 year ago

I put together a PR that would allow for unreferenced strings if they are prefixed with $_. They should be treated completely normally other than you don't have to reference them in the condition. They will still be searched for normally and available in callbacks. I liked the idea of using $_ to signal to the compiler that it was intentionally unreferenced instead of introducing another modifier, because modifiers are meant to indicate to the reader that the string is being modified.

This shouldn't break anyone who is already using $_ for any reason - just gives them the option to make them unreferenced in the future.

plusvic commented 1 year ago

1941 has been merged. So we can have optional strings by prefixing the identifier with an underscore (e.g: $_unused)