VirusTotal / yara

The pattern matching swiss knife
https://virustotal.github.io/yara/
BSD 3-Clause "New" or "Revised" License
7.95k stars 1.42k forks source link

regex: {0} and/or {0,0} considered as invalid repeat interval #2037

Closed mrihtar closed 5 months ago

mrihtar commented 5 months ago

Describe the bug When specifying repeat interval in regular expressions, {0} and/or {0,0} is not recognized by libyara as valid repeat interval, although it's valid in other regex engines.

To Reproduce Example for matching specific IPv4 (10.20.30.40):

rule ip {
  strings:
    $ip = /([^0-9]|){0}10\.20\.30\.40([^0-9]|){0}/
  condition:
    any of them
}

Yara reports: error: rule "ip" in ip.yar(3): invalid regular expression "$ip": bad repeat interval

Expected behavior The above rule "$ip" should match specified IPv4 without any non-decimal number or nothing before and after IP. Although there are other methods how to match IP, this specific regex should work and it does work in other regex engines.

Screenshots \<none>

Please complete the following information:

Additional context The problem seems to be in line 144 in yara/libyara/re_lexer.l: 144: if (hi_bound == 0 && lo_bound == 0) Why is this considered invalid?

plusvic commented 5 months ago

That wasn't intentional, it was an overlook. I can add support for {0} and {0,0} just for compatibility with other regexp engines, but I don't think it will work as you seem to be expecting, though. You said:

The above rule "$ip" should match specified IPv4 without any non-decimal number or nothing before and after IP.

That's not how the {0} quantifier works in regex engines. For instance, the regex /foo(bar){0}/ doesn't mean: match the "foo" string if not followed by "bar". What it really means is: match the "foo" string, period. It's completely equivalent to /foo/, and it will match the "foo" substring in "foobar".

This why nobody have missed this feature before. There's no real use for {0} nor {0,0}.

You can find more context in this discussion: https://stackoverflow.com/questions/7511600/does-the-quantifier-0-make-sense-in-some-scenarios