aws / event-ruler

Event Ruler is a Java library that allows matching many thousands of Events per second to any number of expressive and sophisticated rules.
Apache License 2.0
566 stars 64 forks source link

Support for $and operand #69

Open schenksj opened 1 year ago

schenksj commented 1 year ago

What is your idea?

It would be great to allow an $and operand, (overcoming JSON "last-one-in-wins" semantics for fields) to enable tagging multiple conditions to a given field. I think this may be possible given how I interpret the implementation of the $or operand. For example, this would exclude any value that (starts with "abc" AND ends with "123" AND also excludes "notmyvalue" and "alsonotmyvalue").

{ "$and": [
       {"myField": [{"anything-but": ["notmyvalue", "alsonotmyvalue"]}]},
       {"myField": [{"anything-but": {"suffix": "123"}}]},
       {"myField": [{"anything-but": {"prefix": "abc"}}]}
   ]
}

Would you be willing to make the change?

Maybe

Additional context

Add any other context (such as images, docs, posts) about the idea here.

baldawar commented 1 year ago

Make sense. I've heard some murmurs on needing this elsewhere as well. Similar to $or we'd have to

{ "topField" : {
    "$and": [
          {"myField": [{"anything-but": ["notmyvalue", "alsonotmyvalue"]}]},
          {"anotherField": [{"anything-but": {"suffix": "123"}}]},
          {"lastField": [{"anything-but": {"prefix": "abc"}}]}
      ]
   } 
}

Some relevant files based on $or

schenksj commented 1 year ago

@baldawar - I suspect the above example doesn't need the $and operator in that the fields all have different names (existing engine already handles that)... The only gap I can find that needs addressing is to allow multiple rules on the SAME field, specifically in the suffix/prefix/anything-but-[prefix|suffix]. Do you disagree?

@baldawar @timbray ... Curious of your thoughts... From what I can gather, the OR operator implementation essentially just built "n" separate rules with the same name, with each permutation of "or"s children. In your view, what is the right way to build the $and operator to allow multiple conditions on the same key... Things I can think of: 1) Modify the parser and the state machines to allow multiple rules on a single input field 2) Modify the parser to to create a separately named secondary rule (think parent-child) with each "and" condition, link it to the parent rule, and do a post-order cleanup after executing to reconcile the parent-child relationship (redact the children returning to the caller, redact the parent if all children aren't present) -- based on his comments on my last PR, I'M guessing @timbray would see this as a hack, though its less impactful/risky to the overall codebase

schenksj commented 1 year ago

I suppose if route 2 was taken, it could similarly support a $not operand, but this might be straining the state-machine architecture a bit too much?

timbray commented 1 year ago

In general, over the history of Ruler, we basically haven't added any feature unless we had some group yelling "we really need this". Who really needs this? What's the scenario?

schenksj commented 1 year ago

@timbray An "$and" use case... in a detection system, false positive reductions often revolve around defining lists of exclusions... For example, in a signature were you're trying to detect an individual user doing something naughty (abusing a capability intended for system processes), you might have a list of exclusions to exclude the known accounts who are entitled to use said functionality. This could take the form of accounts running administrative tools (exact matches) plus accounts with particular suffixes (like AD designators, identifiers for system users, etc...)

I suppose the fundamental issue here is that the anything-but operators only allow either a list of explicit matches OR a single prefix or suffix, rather than mix of explicit matches and multiple prefix/suffix's. The same issue is at the root of the "$not" idea.

timbray commented 1 year ago

Not bad. But I meant actual groups with actual concrete problems they need solutions for right now.

In my career, I've had really bad luck guessing what people need.

schenksj commented 1 year ago

I agree. This is a real use case for me. I can fairly easily create a layer on top of event-ruler to perform these functions in my context, but it seemed high-likelihood that someone else would benefit from the functionality as well. Either way I appreciate the engagement.

NickMoores commented 1 year ago

Just came across this by chance when researching how to combine "prefix" and "suffix" on a field.

Eg "key": [{ "prefix": "landing/", "suffix": ".xml" }]

I think $and would help solve this?

timbray commented 1 year ago

Depends what you mean by "solve".

I think this syntax is nice and expressive and we can probably figure out how to build a state machine to implement it:

"key": [{ "prefix": "landing/", "suffix": ".xml" }]

It's a bit of departure from current Rule syntax and raises questions about what you could combine with what, but it's a direction that's worth investigating; and probably less kludgy then $and?

baldawar commented 1 year ago

Thinking about how this would get interpreted for keys with sub-key elements, the syntax isn't that bad (looks simpler compared to using $and matcher)

"key" : [
   { "exists" : true}, 
   "sub-key: [
     { "prefix" : "landing/"},
     { "suffix"  : ".xml" }
   ]
]

It's definitely a big departure from current query syntax but inline with rest of the ruler's existing AND behaviour today. no new matcher for users to learn, a big plus for me.

We'd need to make changes to how we compile rules, starting from this line https://github.com/aws/event-ruler/blob/1b4281e3366534adb9b8dd1544548c14e1817d64/src/main/software/amazon/event/ruler/RuleCompiler.java#L141 . It would also potentially open the door to addressing the caveat with dots in future.

In any case, definitely don't see any reason to supporting $and.

schenksj commented 1 year ago

@baldawar - One potential point of confusion with the syntax above is that something like the below evaluates to true if any-rule is true, not all-rules are true. So, re-using this syntax might be a breaking change.... For example:

{
  "message": [ 
     { "equals-ignore-case": "A" },
     { "suffix": "b" },
     { "prefix": "c" },
     {"exists": false}
  ]
}

{"message": "a"} - matches on rule 1 {"message": "b"} - matches on rule 2 {"message": "c"} - matches on rule 3 {} - matches rule 4 {"message": "d"} - no match

schenksj commented 1 year ago

There is actually a defect tied into here... if you do this:

{
  "message": [ 
     { "equals-ignore-case": "A" },
     { "suffix": "b" },
     { "prefix": "c" },
     { "anything-but": "b to-the c" },
     { "exists": false }
  ]
}

{"message": "b to-the c"} doesn't match because any anything-but match makes a no-match, vs the any-one-rule behavior without anything-buts. @timbray is this what you were referring to about the anything-but design being a little bit of a hack?

baldawar commented 1 year ago

That's a great catch. Ruler can't push breaking change.

Looking back at Nick's comment, I may have interpreted the wrong way. It might just have been an answer to "Who really needs this?" question and had no objections to implementing it like this

{
    "message": {
        "$and": [
            { "prefix": "landing/" },
            { "suffix": ".xml" },
        ]
    }
}

@NickMoores can you confirm?

NickMoores commented 1 year ago

Sorry for the confusion. Can confirm: my intent was to show support for an $and operator.

My example was an alternative syntax I tried for my use case before arriving here. I think it’s expressive, but I’m not read up enough on Ruler to understand whether it aligns with other design choices, so happy to bow to others with experience. I think $and makes sense given the existence of $or.

baldawar commented 1 year ago

From requirements standpoint I have two patterns worth testing here

Simple AND Matching

{ "$and": [
       {"myField": [{"anything-but": ["notmyvalue", "alsonotmyvalue"]}]},
       {"myField": [{"anything-but": {"suffix": "123"}}]},
       {"myField": [{"anything-but": {"prefix": "abc"}}]}
   ]
}

Complex AND & OR Matching

This is purely to stress the functioality and test if we hit any limitations.

{
  "$or": [
    {
      "$and": [
        { "prefix": "landing/" },
        { "anything-but": { "suffix": ".xml" } }
      ]
    },
    {
      "$and": [
        { "anything-but": { "prefix": "landing/" } },
        { "suffix": ".xml" }
      ]
    }
  ]
}
baldawar commented 1 year ago

Adding an interesting ask around $and here for future discussions.

Rule:

{
   "testList": { "$and" : ["a","b","c"] }
}

Event 1 (MATCHES):

{
   "testList": ["a","b","c","d","e"]
}

Event 2 (SHOULD NOT MATCH):

{
   "testList": ["a","c"]
}
sridhard commented 6 months ago

@baldawar any update on $and operator?

Also can you please tell me whether $or works for the same field name. { "$or": [ {"myField": [{"anything-but": ["notmyvalue", "alsonotmyvalue"]}]}, {"myField": [{"anything-but": {"suffix": "123"}}]}, {"myField": [{"anything-but": {"prefix": "abc"}}]} ] }

baldawar commented 6 months ago

any update on $and operator?

No progress has been made yet though we have plans to look at this in late 2024. We don't have a firm date for this to be picked up. That being said, if anyone needs it sooner, we're happy to help guide them through the change.

Also can you please tell me whether $or works for the same field name.

It does, though in future would recommend creating a different issue for unrelated questions to avoid multiple topics getting mixed up in the same thread.

jdcaperon commented 1 month ago

Adding a +1 for a use case, in this case we are looking to use event ruler in an algebra that supports CONTAINS ALL which afaict is not possible to support within a single rule right now. E.g. given a field elements_used it is not possible to assert that all types are contained:

{
  "elements_used": [
    "shapes", 
    "text", 
    "images"
  ]
}

Moreover array support along the concept of CONTAINS EXACTLY would support the totality of set operations? ANY, ALL, EXACTLY? Although this might be pushing event-ruler beyond intended use case.

baldawar commented 1 month ago

Use-case wise supporting $AND fits well with ruler. CONTAINS EXACTLY/ANY/ANYTHING_BUT/ALL could be as well.

In case anyone is curious on progress, there's a branch on this repo where I've backed up work so far. So far, Ruler can parse rulers with $AND matchers, but the actual matching part is not fully-functional yet.