CacheControl / json-rules-engine

A rules engine expressed in JSON
ISC License
2.6k stars 461 forks source link

Slow performance while having large array of facts. #324

Open jyothis-qb opened 1 year ago

jyothis-qb commented 1 year ago

I have integrated json-rules-engine with a project I am working on and the performance seems much slower than I would expect. I'm using the package to do a simple lookup at another set of facts.

const lookupFacts = [
    {  col1: '', col2: '', col3: ''},
    {  col1: '', col2: '', col3: ''},
    {  col1: '', col2: '', col3: ''}
]

const filterRule = {
  conditions: {
    all: [
      {
        path: "$.col1",
        fact: "fact",
        value: {
          path: "$.col1",
          fact: "lookup"
        },
        operator: "equal"
      },
      {
        path: "$.col2",
        fact: "data",
        value: {
          path: "$.col2",
          fact: "lookup"
        },
        operator: "equal"
      },
      {
        path: "$.col3",
        fact: "data",
        value: {
          path: "$.col3",
          fact: "lookup"
        },
        operator: "equal"
      }
    ],
    event: {
      type: 'filter-event'
    }
  }
}

const engine = new require('json-rules-engine').Engine()

let filteredMatchs = await Promise.all(lookupFacts.map((lookup) => {
  return engine.run({ lookup, data })
    .then(({ events }) => events.length > 0 ? lookup : false)
    .catch((err) => false)
    }))
    .then((values) => values.filter((value) => value))

In my case, the lookupfacts array seems to contain about 80000 entries and it takes around 30000 ms to complete. Whereas doing the same comparison using simple javascript code takes about 10-15 ms only.

I will only be not be having any dynamic data in the flow. Is there a way to improve performance?

Thanks

mjaniko commented 1 year ago

@jyothis-qb Any solution for speeding up Performance ?

chris-pardy commented 1 year ago

@mjaniko @jyothis-qb I would need to see the actual rules to get a sense for exactly the cause. However I can say that if you're using JSON path expressions to turn an array of 80,000 objects into an array of 80,000 values then it's going to be slow.

Generally you may be better off trying to flatten/ normalize the data so there are no path expressions to evaluate.

JeffrinCh commented 12 months ago

I'm also facing the same issue but mine is a simple object, but array of object is huge. but the object itself doesn't have a nested path.

was there any resolution on this?

chris-pardy commented 12 months ago

@JeffrinCh Again I would need to see a specific example to understand exactly but if you're using the path attribute you're going to experience some drop in performance as the results of the facts after applying the path transformation are not cached. @CacheControl there may be an opportunity to do some caching of the job path I can look into. @JeffrinCh for now one option would be to create dynamic facts instead of using paths. These would benefit from caching and remove the need to parse JSON path expressions.

chris-pardy commented 11 months ago

@jyothis-qb @JeffrinCh I did some digging and here's my suggestions:

Screenshot 2023-10-12 at 10 03 35 AM

This shows a comparison of runtime across 10,000 executions of calling Almanac.factValue on a fresh Almanac instance so no caching is enabled. The big take-away is that if you involve the path parameter it will cause a slowdown, that will add up.

In order to speed up your access you could create dynamic facts:

engine.addFact(
  new Fact('factCol1', async (_, almanac) => {
      const f = await almanac.factValue('fact');
      return f.col1;
  });
)

If you're doing lots of path access you could simplify this by creating single facts and using parameters

engine.addFact(
   new Fact('factCol', async ({ col }, almanac => {
     const f = await almanac.factValue('fact');
     return f[`col${col}`];
  })
);

// access your fact with
{
   "fact": "factCol",
   "params": { col: 1 }
   ...
}

This provides a slightly improved performance over using the path value but not quite the same performance benefit of having a specific dynamic fact.

CacheControl commented 11 months ago

json-path requires a decent amount of overhead; it's a relatively complex spec. I'm not surprised that using the path feature on a large number of items is causing significant impact.

Its my belief that the underlying json-path library we use (jsonpath-plus) is well optimized for performance, however if there is a performance improvement to be made, it will reside in that library.

I agree with the workaround above of using dynamic facts in place of json-path.

chris-pardy commented 11 months ago

@CacheControl I also did some digging / profiling of jsonpath-plus and it does seem to be very very optimized. It already caches the results of compiling a path into a function so repeated uses of the same path will not cause re-compilation. The performance is so optimized that even without the caching the behavior is only slightly abnormal.

Suggesting this could probably be closed with Dynamic Facts being the solution.

JeffrinCh commented 11 months ago

@CacheControl I also did some digging / profiling of jsonpath-plus and it does seem to be very very optimized. It already caches the results of compiling a path into a function so repeated uses of the same path will not cause re-compilation. The performance is so optimized that even without the caching the behavior is only slightly abnormal.

Suggesting this could probably be closed with Dynamic Facts being the solution.

Will try these and check

iay25 commented 6 months ago

hi @CacheControl @chris-pardy @JeffrinCh , We are using json-rule-engine in our project to get outcome by evaluating around 10k records stored in mongodb. I am sharing one rule for your reference. We have 10k record like this in our project and evaluating it against facts takes around 10-15 seconds.

{ "conditions": { "all": [ { "fact": "customer_delivery_address", "operator": "equal", "factLabel": "Customer Delivery Address", "value": "GB", "valueSet": [ { "value": "GB", "label": "GB" } ] }, { "fact": "customer_tier", "operator": "equal", "factLabel": "Customer Tier", "value": "gold", "valueSet": [ { "value": "gold", "label": "Gold" } ] }, { "fact": "new_customer", "operator": "isBoolean", "factLabel": "New Customer", "value": true, "valueSet": [ { "value": true, "label": true } ] }, { "fact": "order_amount", "operator": "greaterThan", "factLabel": "Order Amount", "value": 2500, "valueSet": [ { "value": 2500, "label": "2500" } ] }, { "fact": "order_count", "operator": "lessThan", "factLabel": "Order Count", "value": 100, "valueSet": [ { "value": 100, "label": "100" } ] }, { "fact": "order_date", "operator": "isDateGreaterThan", "factLabel": "Order Date", "value": "2024-02-01T05:15:44Z", "valueSet": [ { "value": "2024-02-01T05:15:44Z", "label": "2024-02-01T05:15:44Z" } ] }, { "fact": "order_date", "operator": "isDateLessThan", "factLabel": "Order Date", "value": "2024-03-01T05:10:50Z", "valueSet": [ { "value": "2024-03-01T05:10:50Z", "label": "2024-03-01T05:10:50Z" } ] }, { "fact": "order_state", "operator": "equal", "factLabel": "Order State", "value": "confirmed", "valueSet": [ { "value": "confirmed", "label": "Confirmed" } ] }, { "fact": "payment_state", "operator": "equal", "factLabel": "Payment State", "value": "paid", "valueSet": [ { "value": "paid", "label": "Paid" } ] }, { "fact": "customers", "operator": "equal", "factLabel": "Customers", "value": "sample@gmail.com", "valueSet": [ { "value": "sample@gmail.com", "label": "sample@gmail.com" } ] } ] }, "event": { "type": "categories", "params": { "label": "Category", "value": "Fitness Kit", "key": "8173dfd1-d8a1-417d-ab78-07dfa6799f59", "operator": "is", "source": "resource" } } }

Can you please help me understand why it is taking so much time to evaluate this type of rule. If possible please share solutions for improving performance also.