Closed psrok1 closed 1 year ago
Can you elaborate more on the concept of "staged matching"? I understand you are interested in those unreferenced strings in the rule, but is not clear to me whether you are interested in them only when the rest of the rule matches, or when it doesn't match, or in both cases.
Yes, we're interested in them only when the rest of the rule matches. By "staged matching" I mean:
or
/1 of
clause because they're not specific enough for general matching (we may get false-positives), but it's also fine when we get no matches at all (we just get only part of information, which could be an indication that part of rule must be fixed).In some cases we decided that it would be great to get complete information from single rule in one pass, that's why 0 of
hack appeared in our ruleset :smile: Especially if these strings are still fast enough for general matching.
To give a more concrete example: we write "normal" yara rules first, with the goal of hunting for the specific malware family. We don't have use for 0 of
in them.
Then we put the yara rules in our malware extraction system (built using https://github.com/CERT-Polska/malduck, our framework used by a few other orgs). The system runs a callback for every string matched by yara rule. And now, sometimes it's useful to add a "non-detecting" string that will only be used in the malware extraction (for example, string that detects encryption function, that we use in our module toextract the encryption key). We don't want to mess with detections at this point, since the rule is already tested, we just want to have another optional string. Sometimes it's possible to do this in other way, sometimes the 0 of
hack comes handy. Lack of workaround is not the biggest issue, but we worry a bit about the backward compatiblity of our system (and other users of our project).
This is probably not the only use case we had, but the one that we stumbled upon first.
Interesting, I never thought that someone would be using 0 of them
in real-life rules. That's why 0 of them
was made a special case to make it coherent with none of them
.
I'm not sure what's the better solution here, I don't dislike the option of going back to the original meaning for 0 of them
and treating none of them
as a different case. I'm going to gather opinions among other YARA users and see what comes out of it.
I like the idea of marking a string as intentionally unreferenced. Seems like an elegant solution to this problem while keeping the semantics of 0 of them
and none of them
clear. Also, intentionally unreferenced strings opens up precisely the scenario you are describing of having your callback process a string that happened to match even if you don't need it in the condition. Seems like a powerful thing to have the capability to do.
+1 for the string modifier idea!
I modified my rules to use #optional_string >= 0
instead of 0 of $optional_string
after the change.
In my rules where I want to compute some strings but not have them influence the condition, i use the # >= 0
trick: <real condition> and for all of ($optional_*): (# >= 0)
.
Another point in favor of the string modifier, or against changing back the 0 of them
meaning, is providing intent and opening the door for optimizations. To specify a bit more, it can be useful to know whether, in order to find if a rule matches or not (ie if the condition is true or not):
any of them
condition for example)for any in in (1..#a): (!a[i] > 5)
for example)
This is because it is usually cheaper for a regex to simply know if there is a match compared to computing the exact boundaries of the match. Also, if a string already has a match, and this information is enough, further check on matches can be skipped.The issue with the 0 of (...)
syntax is that it does not indicate that the full matches computations are needed for those strings. So it's a bit of an all or nothing. Either optimize all strings, but you might miss some match callbacks. Or do not optimize any strings, and you might lose a bit of performance.
However, with a string modifier, it is possible to distinguish the two sets of strings, and then only optimize the matching strings, and not the "optional" ones
I put together a PR that would allow for unreferenced strings if they are prefixed with $_
. They should be treated completely normally other than you don't have to reference them in the condition. They will still be searched for normally and available in callbacks. I liked the idea of using $_
to signal to the compiler that it was intentionally unreferenced instead of introducing another modifier, because modifiers are meant to indicate to the reader that the string is being modified.
This shouldn't break anyone who is already using $_
for any reason - just gives them the option to make them unreferenced in the future.
$_unused
)
Is your feature request related to a problem? Please describe.
We spotted that commit https://github.com/VirusTotal/yara/commit/7a99e6dd1212e3dc7e170d5b229703e796d7a314 released in v4.2.3 changed
0 of (...)
behavior, so it broke some of our Yara rules:0 of
was used in our rules to overcome the fact that Yara doesn't allow matching strings that are not referenced in the condition. In our case we wanted to have "optional" strings that don't affect the match result, but are still looked up so we can get their offsets and extract additional information.Previous meaning of
x of (...)
was "match if there are at least x occurrences" of the specified string, so0 of
still matched the original semantics well.I understand that it started to be confusing when
none
keyword arrived sonone of (...)
behaved the same like0 of (...)
as noted in https://github.com/VirusTotal/yara/issues/1695Describe the solution you'd like
It would be nice to have another way to indicate that some strings are intentionally unreferenced in the condition, but should still be matched. Right now we're doing "staged matching" in multiple places, spawning another Yara match with optional strings and
any of them
condition.Possible solutions I would like are:
none of
and0 of
, leaving the original0 of
meaningLet me know what do you think about it!