VirusTotal / yara-python

The Python interface for YARA
http://virustotal.github.io/yara/
Apache License 2.0
648 stars 179 forks source link

yara-python 3.11.0 failed match regex with wide attribute #122

Closed maayan-sh closed 2 years ago

maayan-sh commented 4 years ago

I have a regular expression that is found in a file, but the yara-python doesn't find it when adding a wide attribute (yara.exe in cmd finds it).

For example, the yara rule is: rule example { strings: $e0 = /expression/ wide condition: $e0 }

The Python commands are: import yara rules = yara.compile() rules.match()

But there is no match. Only when I remove the 'wide' I'm getting matches. I have tried it on Python 3.6.8 and 3.7.3.

plusvic commented 4 years ago

When used alone, the wide modifier makes the regexp to match wide strings only. If you want both wide strings and normal ascii strings, you must use wide and ascii together. Your file probably contains the pattern in ascii form, but not in wide form.

maayan-sh commented 4 years ago

My original rule contains nocase wide ascii for the regex. The command line yara finds it but yara-python doesn't.

malvidin commented 3 years ago

I can't reproduce with Python 3.7 and yara-python 3.10.0 or yara-python 4.0.4.

This test appears to be 5 years old, and should cover the case you are addressing, but isn't directly addressed in tests.py. https://github.com/VirusTotal/yara/blob/a07ec0812de0aa2402f4bf74aea3a49248a7aa48/tests/test-rules.c#L534

Because the following appears to validate that YARA is matching as intended, we would need a sample rule/file to confirm this issue existed or persists.

import yara

rule_source = """
rule example {
  strings:
    $e0 = /expression/ nocase wide ascii
  condition:
    $e0
}
"""
rules = yara.compile(source=rule_source)

ascii_expression = "expression".encode("ascii")
with open("ascii_expression.txt", "wb") as f:
    f.write(ascii_expression)

wide_expression = "expression".encode("utf-16")
with open("wide_expression.txt", "wb") as f:
    f.write(wide_expression)

assert rules.match(data=ascii_expression) == rules.match("ascii_expression.txt")
assert rules.match(data=wide_expression) == rules.match("wide_expression.txt")