VirusTotal / yara

The pattern matching swiss knife
https://virustotal.github.io/yara/
BSD 3-Clause "New" or "Revised" License
8.13k stars 1.42k forks source link

Yara regex doesn't accept non-capturing groups #1900

Closed djlukic closed 1 year ago

djlukic commented 1 year ago

Hi,

I tried to optimize some yara rules and its memory usage by including non-capturing group in regex instead of capturing groups and I am getting: invalid regular expression "$": syntax error, unexpected '?'

Let's say I want to capture reference of Temp folders by using: (?:Local\\Temp|Windows\\Temp) instead of (Local\\Temp|Windows\\Temp)

where I specify it must be either Localappdata or Windows temp folder and not samo randomly created Temp folder hence the brackets.

Thank you!

plusvic commented 1 year ago

Can you explain a bit more about what do you want to achieve? Why do you think that using non-capturing groups have an effect in memory usage?

YARA doesn't support capture group because they don't have an impact whether the regular expression matches or not, they are useful for capturing portions of the matching data that corresponds to certain parts of the regular expression. Without a mechanism for using the captured group outside of the regexp, capture groups themselves are useless.

djlukic commented 1 year ago

I did some calculation using https://docs.python.org/3/library/tracemalloc.html module for capturing and non-capturing group and non-capturing group regular expressions allocated less memory. You say that this wouldn't have an impact in Yara?

plusvic commented 1 year ago

Exactly, capture groups don't have any effect in memory usage in YARA. That can be true in Python, because regular expressions in Python actually capture the data matched by capture groups, but in YARA capture groups are not really supported.