VirusTotal / yara

The pattern matching swiss knife
https://virustotal.github.io/yara/
BSD 3-Clause "New" or "Revised" License
8.13k stars 1.42k forks source link

Maximum offset distance in condition - perhaps `strings` module idea? #1897

Open tlansec opened 1 year ago

tlansec commented 1 year ago

Is your feature request related to a problem? Please describe. Sometimes I have a rule with a relatively complex set of strings. While most of the time I'm interested in matching against a single file I also want to match the rule against memory samples, therefore I don't want to use a filesize constraint as the memory sample is very large. Instead, what I want to do is specific the maximum distance between any matched strings.

Describe the solution you'd like This may be an idea for the distant future, but I'd like a solution where I can (for a complex set of strings) have a special variable like:

strings:
     [whatever strings i want]
condition: 
     all of them and
     strings.max_offset - strings.min_offset< N

This strings variable (or module) could also allow inspection of:

wxsBSD commented 1 year ago

At least one of these could be done already: total string matches could be done with the #a syntax.

I don't see any easy way to do the others right now as I don't think there is a way to pass a YARA string into a function right now. It is a good idea though.

I had initially thought about wanting to express the logic like this:

math.abs($a.max_offset() - $b.min_offset()) < 100

While that would be a nice way to do it and be extensible it would mean the compiler would get more and more complicated with each new function/attribute we want to expose on a string match. As such, I'm liking your idea of expressing this in a module:

math.abs(strings.max_offset($a) - strings.min_offset($b)) < 100

This is all assuming we eventually grow support for allowing for YARA strings to be used as arguments. I had wanted to spend my time on yara-x but this is such an intriguing idea that I'm curious what @plusvic says about it. I feel like this is something I could likely implement fairly quickly too.

tlansec commented 1 year ago

For this:

total string matches could be done with the #a syntax.

It could be done, but for a rule containing 30 strings the rule becomes cumbersome to read and write.

[...] allowing for YARA strings to be used as arguments

I think this is the most elegant solution actually, because the condition you write is really the most common type of thing I want to express.

plusvic commented 1 year ago

This request is interesting because it exposes the current limitations in the language. I agree @wxsBSD's comment, in order to implement this (and more powerful features in the feature) we may need to implement one of the following features (or both):

1) String identifiers as arguments to functions, like in foo($a), where foo has access to all the information about the $a pattern, including the current matches.

2) Methods associated to string identifiers, like in $a.foo(). This is really an special case of 1, once you have 1 implementing this should be straightforward.

There are more cases in which this would be helpful, and I'm getting more and more convinced that we must introduce something like this in order to unleash a series of enhancements that would bring YARA to the next level in terms of expressiveness.

I wouldn't implement this in the current C implementation, though. It would require a lot of changes, and my focus is now on bringing YARA-X forward. I also think that this is going to be easier to implement in YARA-X.

What we could start doing collectively is designing the changes that we want, writing RFCs like this one https://github.com/VirusTotal/yara/discussions/1783. Even if we don't start implementing these ideas right away, we can start working on defining and polishing the ideas. The delay may be beneficial, as the ideas have time to settle down and mature before they are implemented.

tlansec commented 1 year ago

I am happy to try and come up with an RFC like the one cited if you like (although maybe Wes has more experience writing such documents). Should RFCs go in "Discussions" as their own Conversation or do they get put in the Road Ahead discussion?

plusvic commented 1 year ago

For the time being I would put each RFC as in independent discussion.