StephenOTT / camunda-prometheus-process-engine-plugin

Monitor your KPIs!!! Camunda BPM Process Engine Plugin providing Prometheus Monitoring, Metric classes for various BPMN use, Grafana Annotations, and HTTPServer data export: Used to generate Prometheus metrics anywhere in the Engine, including BPMN, CMN, and DMN engines and instances.
MIT License
53 stars 24 forks source link

High memory usage #31

Closed epoddubny closed 4 years ago

epoddubny commented 4 years ago

After adding camunda-prometheus-process-engine-plugin I noticed that my application uses a lot of memory. 2 GB RAM instead of 400 MB before. Then I took heap dump and discovered that 80% memory uses ProcessInstances.groovy

<dependency>
    <groupId>com.github.StephenOTT</groupId>
    <artifactId>camunda-prometheus-process-engine-plugin</artifactId>
    <version>1.7.2</version>
</dependency>
system:
- collector: io.digitalstate.camunda.prometheus.collectors.camunda.BpmnExecution
  enable: true
  startDate: now
  endDate: now
  startDelay: 60000
  frequency: 300000
- collector: io.digitalstate.camunda.prometheus.collectors.camunda.JobExecutor
  enable: true
  startDate: now
  endDate: now
  startDelay: 60000
  frequency: 300000
custom:
  - collector: classpath:/prometheus/customcollectors/ProcessInstances.groovy
    enable: true
    startDelay: 60000
    frequency: 300000
  - collector: classpath:/prometheus/customcollectors/TimerMetrics.groovy
    enable: true
    startDelay: 60000
    frequency: 300000
  - collector: classpath:/prometheus/customcollectors/IncidentMetricsRuntime.groovy
    enable: true
    startDelay: 60000
    frequency: 300000
01
StephenOTT commented 4 years ago

@epoddubny how many process instances is your Camunda engine running?

The process instances groovy file uses the Camunda API to get a process instance object for every instance: if you have lots of instances then would eat a lot of memory to process them

epoddubny commented 4 years ago

@StephenOTT Yes, you’re right I have a lot of running process instances ~23 thousand and it’s not limit. But I think it’s not an optimized solution to load all processes into memory for count the number. Because one metric eats all your memory and if you open Camunda cockpit and see all running processes you will not have the problem with memory.

StephenOTT commented 4 years ago

Yes agreed it was not optimized for that. As you scale, depending on your needs you can adjust your queries to do group by queries using custom sql. The custom queries groovy files are a collection of samples. They are in no way optimized for all use cases.

The simplest solution for you is to disable the process instance, and make a new version that implements a groovy SQL() query. You can access the Db of Camunda with something like https://github.com/StephenOTT/camunda-prometheus-process-engine-plugin/issues/29#issue-521234913.

Then you can do a group by in the sql based on the process instances table, and count the number of instances: count group by definition key for example.

StephenOTT commented 4 years ago

@epoddubny were you able to give the sql group by a try?

epoddubny commented 4 years ago

@StephenOTT The problem was with not used historical metadata. @VlasyukA will write more details later.

VlasyukA commented 4 years ago

@StephenOTT The problem was with huge amount of old and test processDefinitions. After clean up https://camunda.com/best-practices/cleaning-up-historical-data/, all is ok

StephenOTT commented 4 years ago

Okay great. Note that you will still get this problem again if your db grows to lots of active processes or you keep your history a long time.

Using the SQL "group by" mitigates this.