aws / event-ruler

Event Ruler is a Java library that allows matching many thousands of Events per second to any number of expressive and sophisticated rules.
Apache License 2.0
566 stars 63 forks source link

How to scale and increase availability #119

Closed sridhard closed 11 months ago

sridhard commented 11 months ago

Hi,

I have 2 queries for event-ruler:

  1. Suppose we have a million rules. In case of any machine crash or process crash do we need to rebuild all the rules or the rules are persisted to file and event-ruler will read from the persisted data?

  2. Suppose the event load is very high and we want to do horizantal scaling of event ruler nodes. what is the best method?

  3. In case of multi tenancy, can we have a seperate rule state machine for each tenant? considering 100000 tenants? state machine I mean the machine class in event ruler?

Thanks

baldawar commented 11 months ago
  1. Ruler doesn't persist rulers on disk. Everything runs in memory. If you want persistence, then you should write to disk before adding a rule and then build ruler again with these rules when your process boots up after a restart.
  2. How high are you thinking about? We've seen ruler work without any issues for 100K~200K TPS. If you need to support more parallel processing, you should consider partitioning or sharding your traffic. This would mean some of your traffic only hits a specific partitioned set of hosts and you only load relevant rules within each partition. This is complex but a allows you to horizontally scale.
  3. For multi-tenant systems, one rule-machine per tenant is great. EventBridge does that and it works for a lot more tenants. If you can lazy-load rules when customers are driving traffic, you might even be able to support with less hardware.

There's a fair amount of nuances around multi-tenant architectures that I didn't go into deep here. I'd strongly recommend you take a look at these two resources to learn more

sridhard commented 11 months ago

Thanks a lot.

We are currently using AWS. We wanted to use AWS but AWS has limits on the number of rules with eventbridge.