StephenOTT / camunda-prometheus-process-engine-plugin

Monitor your KPIs!!! Camunda BPM Process Engine Plugin providing Prometheus Monitoring, Metric classes for various BPMN use, Grafana Annotations, and HTTPServer data export: Used to generate Prometheus metrics anywhere in the Engine, including BPMN, CMN, and DMN engines and instances.
MIT License
53 stars 24 forks source link

Recommended value for camundaReportingIntervalInSeconds #34

Closed low-on-mana closed 4 years ago

low-on-mana commented 4 years ago

We were thinking of using this plugin in production. According to this thread https://forum.camunda.org/t/performance-of-act-ru-meter-log/15722, current value of 5 sec is creating troubles by inserting lot of rows. So what value is recommended ?

Also on a side-note, camunda stores all its metrics in table ACT_RU_METER_LOG. What is the relation of this plugin with that table? As I see in code, we are overriding metric reporter frequency.

StephenOTT commented 4 years ago

The link you provided is using the plugin, but did not configure the plugin for their needs.

Camunda metrics is https://docs.camunda.org/manual/7.9/user-guide/process-engine/metrics/#built-in-metrics.

The Prometheus plugin for Camunda is providing a capability to run queries that report the query results to a Prometheus API. See the scripts defined in the yaml config.

low-on-mana commented 4 years ago

Thanks @StephenOTT, that cleared some doubts. But you didn't answer recommended value for camundaReportingIntervalInSeconds . Is 5 sec alright ? We have couple of instances of camunda running behind a load balancer in a kubernetes cluster. Since these values are fetched from db, I am assuming we dont have to scrape all pods. We can just scrape any one of the pod behind the service.

StephenOTT commented 4 years ago

@t0il3ts0ap the camundaReportingIntervalInSeconds value is a override on this: https://docs.camunda.org/manual/7.9/user-guide/process-engine/metrics/#metrics-reporter

The default 15 minutes that camunda applies.

In practice, in my experience everyones needs seem to be quite different, so the specific value you want to apply would be based on your setup and business processes.

You can ~likely apply the default 15 min duration (or longer) (rather than something like 5 seconds) as the default metrics are very generic: https://docs.camunda.org/manual/7.9/user-guide/process-engine/metrics/#built-in-metrics.

StephenOTT commented 4 years ago

Also: The Camunda Metrics: https://docs.camunda.org/manual/7.9/user-guide/process-engine/metrics/#built-in-metrics are actually from each node / engine / "pod". They are stored in map, and sent to the DB on the camundaReportingIntervalInSeconds value. So you require each node in your cluster to report the metrics if you want a true count. Off the top of my memory, The default Camunda Metrics do not count from a DB query, rather than count from in-memory running count that is per node in the cluster, and committed to DB on the camundaReportingIntervalInSeconds value (or the default 15 min value: https://docs.camunda.org/manual/7.9/user-guide/process-engine/metrics/#metrics-reporter )

StephenOTT commented 4 years ago

@t0il3ts0ap note for production use i would recommend that you setup a standalone engine connected to the db. This engine would have its job executor disabled, It would only be used for running the prometheus checks. This way you do not create any performance hits on the cluster's processing power