[Correlations] Issue with rule parsing

Res260 commented 10 months ago

I'm getting my hands on correlations and am trying to support that in our PySigma Backend.

Given this rule taken from the official sigma-specification repo:

title: Correlation - Multiple Failed Logins Followed by Successful Login
id: b180ead8-d58f-40b2-ae54-c8940995b9b6
status: experimental
description: Detects multiple failed logins by a single user followed by a successful login of that user
references:
    - https://reference.com
author: Florian Roth (Nextron Systems)
date: 2023/06/16
correlation:
   type: temporal
   rules:
      - multiple_failed_login
      - successful_login
   group-by:
    - User
   timespan: 10m
   ordered: true
falsepositives:
    - Unlikely
level: high
---
id: a8418a5a-5fc4-46b5-b23b-6c73beb19d41
description: Detects multiple failed logins within a certain amount of time
name: multiple_failed_login
correlation:
    type: event_count
    rules:
      - failed_login
    group-by:
      - User
    timespan: 10m
    condition:
      gte: 10
---
id: 53ba33fd-3a50-4468-a5ef-c583635cfa92
description: Detects a single failed login
name: failed_login
logsource:
  product: windows
  service: security
detection:
    selection:
      EventID:
        - 529
        - 4625
    condition: selection
---
id: 4d0a2c83-c62c-4ed4-b475-c7e23a9269b8
description: Detects a successful login
name: successful_login
logsource:
  product: windows
  service: security
detection:
    selection:
        EventID:
          - 528
          - 4624
    condition: selection

When I call SigmaCollection.from_yaml(), I get this error for the sub-rule with ID a8418a5a-5fc4-46b5-b23b-6c73beb19d41:

sigma.exceptions.SigmaTitleError: Sigma rule must have a title

If I add a title to that rule, I get the same error, but for the sub-rule with ID 4d0a2c83-c62c-4ed4-b475-c7e23a9269b8

Is the title mandatory for sub-rules? If so, I'll update the specification examples, because they are misleading :)

Res260 commented 10 months ago

Also, if the title is mandatory, isn't the name property redundant?

Res260 commented 10 months ago

Another question I have:

When you do something like that:

rule = SigmaCollection.from_yaml(failed_logins_followed_by_successful_login)
result = mybackend.convert(rule)

Where failed_logins_followed_by_successful_login is a string of the YAML documents of the rule mentioned in the original post. There seems to be no easy way for the backend to know which rules to actually convert into actual rules (i.e. only the rule with ID b180ead8-d58f-40b2-ae54-c8940995b9b6). This rule has a rule attribute that references its dependent rules, so why does SigmaCollection.from_yaml() returns 4 rules instead of one? Or, alternatively, is there a way to know which rules are not referenced by any other rule (which would indicate that these rules are the ones to be converted by the backend)?

thomaspatzke commented 10 months ago

Hi!

The example is wrong. The title was always a mandatory attribute but the parser just ignored it. With pySigma 0.11 this is now handled more strictly, but the change was unrelated to correlation rules. Therefore, also the rules referenced from a correlation rule must have a title attribute.

I've also updated the correlation rule example in the specification. Thanks for pointing this out, I wasn't aware about it 😁 You have by the way used an even more outdated example. The ordered attribute doesn't exist anymore. Instead of this, a dedicated correlation type temporal_ordered was introduced.

Also, if the title is mandatory, isn't the name property redundant?

Basically, title is intended for humans while name is for machines 😉 First one might change to be more clear, reflect rule changes or simply fix typos. Because such a dynamic attribute isn't suitable for being used as reference something more static for was required. There is id but it has some readability issues. Therefore, name was introduced and is the preferred way to reference rules in correlations, while id is also possible.

This rule has a rule attribute that references its dependent rules, so why does SigmaCollection.from_yaml() returns 4 rules instead of one?

Depending on the use case it can be necessary that the other rules are also needed independently. To keep things simple I've decided doing it this way, but...

Or, alternatively, is there a way to know which rules are not referenced by any other rule (which would indicate that these rules are the ones to be converted by the backend)?

...yes, there's a simple way to do this 😊 Better two:

Rules now have an attribute _backreferences that references back to the rules referencing this rule.
The flag _output is possibly even more suitable for this purpose because it determines if the rule is intended to be output. The default behavior is that rules referenced by a correlation are disabled for output.

I've just pushed a commit that adds the convenience iteration methods get_output_rules() and get_unreferenced_rules() to SigmaCollection that will be included in the next release.

Res260 commented 10 months ago

Thanks, I just tried it. Two things:

There is a typo in the name of the rule:
When I try to call both of these methods, the output is a list of 4 items (the same size as my SigmaCollection):

If I look closely, the _backreferences property is empty in my rules:

Not quite sure what's happening here. I'm using the latest updated example from the specification.

As a side-note, and for what it's worth :p I personally think having both title and name clutters a sub-rule, and would prefer to be able to use one of them. I don't think a sub-rule's title is more dynamic than it's name.

Res260 commented 10 months ago

Update: after a bit of debugging, the solution is to call mysigmacollection.resolve_rule_references() after creating it. Is it the intended behavior that SigmaCollection.from_yaml() does not do it itself? From my understanding, this means that it's the Backend's job to call this method. That's correct?

thomaspatzke commented 10 months ago

Update: after a bit of debugging, the solution is to call mysigmacollection.resolve_rule_references() after creating it. Is it the intended behavior that SigmaCollection.from_yaml() does not do it itself? From my understanding, this means that it's the Backend's job to call this method. That's correct?

Yes, that's correct. I thought the reference resolution belongs to the conversion process and it's more efficient to keep it out from the rule parsing if someone just needs parsed rules. Perhaps it's better to integrate it in the parsing stage and make it possible to disable...what do you think?

Res260 commented 10 months ago

I'm not sure what's the best default, but I think it'd be natural to have the option to resolve on parsing a rule. If it's not the defaut, I think it should be documented it in SigmaCollection's class docstring.

SigmaHQ / pySigma

[Correlations] Issue with rule parsing #186