VirusTotal / yara-python

The Python interface for YARA
http://virustotal.github.io/yara/
Apache License 2.0
646 stars 179 forks source link

Cannot obtain meta information for rules after yara.compile() #195

Closed ejkitchen closed 2 years ago

ejkitchen commented 2 years ago

I would like to be able to extract the meta-information for all rules in the compiled space before I start any file scanning. Right now, the only way I have found to do this is to actually run a match on a file and then extract the meta-information from the match class by looking at the first offset and using the Yara file itself before I start scanning. Since we're only defining rules with simple strings this is just a stop-gap. We have information inside the meta section that we need to use prior to scanning anything and triggering any rules. Is there a way to do this?

Here's the code I am using now that works with simple strings as rules:

fn = "path to yara file" rules = yara.compile(fn)

would like to access the meta section here prior to scanning

m = rules.match(fn)[0].meta

If there is no way to do this currently, can I go ahead and extend the code to support this (assuming I follow all community guidelines?) There should be no reason to not be able to examine all of the rules that were compiled and retrieve their respective meta sections prior to a scan.

P.S. We are using the meta section to extend the scanning rules and have implemented a replacement grammar so that our Python app doesn't just find things but also removes things in some cases from files as it goes along.

wxsBSD commented 2 years ago

You can definitely do this. We have scripts that run on our rules to ensure certain metadata exists.

ejkitchen commented 2 years ago

Hi Wesley,

Just want to make sure I understand! I can go ahead and extend the code or this can already be done without scanning files?

wxsBSD commented 2 years ago

Sorry, I was stuck on mobile and should have waited to give you a more reasonable response. You can do it without scanning files:

>>> import yara
>>> rules = yara.compile(source="rule a { meta: foo = 1 condition: true }")
>>> for rule in rules:
...     print(rule.meta)
...
{'foo': 1}
>>> 
ejkitchen commented 2 years ago

Thanks Wesley! I completely missed that. I ran some debug code on that class to get the attributes and it didn't come up for some reason. I am good to go. Again thanks for your quick response.

BTW, this is what I ran on the rules object to see its methods and attributes. attr(rules) didn't list meta for some reason

def api(obj):
    return [name for name in dir(obj) if name[0] != '_']

def attrs(obj):
    disallowed_properties = {
        name for name, value in getmembers(type(obj))
        if isinstance(value, (property, FunctionType))}
    return {
        name: getattr(obj, name) for name in api(obj)
        if name not in disallowed_properties and hasattr(obj, name)} 
ejkitchen commented 2 years ago

Hi Wesley, I have one final question. How do I access the strings: section of rules prior to doing a file match? I tried the obvious rule.strings from within for the for loop above, but that didn't work. The rules object is also not subscriptable so I can't do rules[rule.identifier].strings. I did a dump of the rule object and all I get is the following:

{'identifier', 'is_global', 'is_private', 'meta'}, 'tags': []}
wxsBSD commented 2 years ago

Strings and conditions are not available after compilation. This is due to the fact that yara-python is a thin wrapper around libyara, and it needs to use the builtin libyara compiler to make sure the rules are valid. There are other things that you can use as a parser (https://github.com/virustotal/gyp or https://github.com/plyara/plyara) that can do what you want. The downside of them is that they have to be kept up to date with the official grammar, but both of those tend to be updated pretty quickly after each new release of YARA.

As for rules objects not being subscriptable that is true, and while it may be possible to do what you want with some improvements to yara-python I'm not sure it's worth it since the immediate use case I have seen for accessing rules is some kind of validation of the structure of the rule (that is, checking that certain metadata exists or that it is not global, etc). Since the validation is to be applied to all rules it doesn't make sense to have a way to access a given rule by identifier. I'm not opposed to seeing it done but it may be better suited to something using plyara which has an understanding of the grammar, and produces something more user-friendly for the kind of work it sounds like you may be doing.

ejkitchen commented 2 years ago

Hi Wesley,

Thanks for the response. I installed plyara and that's perfect for our needs. I somehow missed it when searching for yara parsers.