Sorting maps for more flexible rule matching

decibelcooper commented 4 years ago

Hey there @jsccast

I'm more and more frequently bumping up against limitations in the rule matching due to arrays of maps not being sortable. My thought is that it is worth implementing some sort of sorting procedure for maps, even if it isn't super fast. Do you have any thoughts on this?

Thanks!

decibelcooper commented 4 years ago

E.g., something as simple as doing string comparison on fmt.Sprintf("%v", ...) output. I'm playing around with this right now as a means of prototyping a rule that I really want to match on arrays of maps.

jsccast commented 4 years ago

Can you give a specific example or two where the current behavior is not ideal?

decibelcooper commented 4 years ago

Consider the rule

{
    "id": "asdf",
    "rule": {
        "when": {
            "pattern": {
                "things": [
                    {
                        "id": "?id",
                        "data": "?data"
                    }
                ]
            }
        },
        "action": {
            "code": "Env.out({id: id, data: data})"
        }
    }
}

and the incoming event

{
    "things": [
        {
            "id": "1",
            "data": "asdf"
        },
        {
            "id": "2",
            "data": "blah"
        }
    ]
}

When the above event is ingested at a location containing the rule, I get the error

slice []interface {}{map[string]interface {}{"data":"asdf", "id":"1"}, map[string]interface {}{"data":"blah", "id":"2"}} is not sortable

NOTE: since go 1.12, maps are formatted deterministically: https://tip.golang.org/doc/go1.12#fmt

decibelcooper commented 4 years ago

The linked WIP PR is what I'm playing around with. So far so good. But I have not given the implications a ton of thought yet. Just wanted to get your thoughts first.

jsccast commented 4 years ago

I'm not sure, but it's possible that limitation was removed in the Sheens pattern matching implementation:

patmatch -m '[{"a":1},{"a":2}]' -p '[{"a":"?n"}]'

gives

[{"?n":1},{"?n":2}]

If so and if other behavior is acceptable, it might be possible to use that newer implementation here. That Sheens pattern matcher started as a rewrite of the Rulio pattern matcher.

decibelcooper commented 4 years ago

@jsccast the rulio pattern matching in general does not have significant limitations. It is rule matching specifically that does. For that, I see that arrays are required to be sorted, presumably for efficiency. For pattern matching of facts, no such limitation exists.

jsccast commented 4 years ago

Ah, okay. At least at some point (and perhaps still today), rule dispatch was optionally indexed, and that capability might have imposed an ordering requirement even when indexing wasn't actually used. Not at all sure.

decibelcooper commented 4 years ago

@jsccast thank you so much for clarifying that. I was able to switch to linear/unindexed state and this resolves my problem! Can you please explain the potential drawbacks of using the linear state?

jsccast commented 4 years ago

Pure performance/overhead trade-off, I think. The linear search is a naive scan of all patterns. Unless you have a lot of them (>100?) in the target set of rules or unless the patterns are complicated, this naive approach will probably perform acceptably. That index business basically builds and maintains a special index for possible patterns. In theory and sometimes in practice, the speed improvements for rule dispatch are worth the cost of updating the index as the set of possible rules changes.

Bottom line: Probably use the (linear) scan unless and until you think you need better time performance for rule dispatch? That guideline should probably be documented somewhere.

decibelcooper commented 4 years ago

@jsccast perfect. Just wanted to make sure I had a correct understanding. Feel free to close this issue or keep it around for documentation work or something.

Comcast / rulio

Sorting maps for more flexible rule matching #62