camunda-community-hub / camunda-8-benchmark

Helper to create benchmarks for Camunda Platform 8 and Zeebe
21 stars 8 forks source link

PI completed metric no longer updated #10

Open gabortega opened 2 years ago

gabortega commented 2 years ago

Hello,

Whilst trying to use the benchmarking tool, we noticed that the _picompleted metric was never updated. This results in no data being displayed on Grafana for both this metric and _picycletime.

Doing some debugging, I found the culprit to be this code section in JobWorker.java:

private void registerWorker(String jobType) {
        long fixedBackOffDelay = config.getFixedBackOffDelay();

        JobWorkerBuilderStep1.JobWorkerBuilderStep3 step3 = client.newWorker()
                .jobType(jobType)
                .handler(new SimpleDelayCompletionHandler(false));

        if(fixedBackOffDelay > 0) {
            step3.backoffSupplier(new FixedBackoffSupplier(fixedBackOffDelay));
        }

        step3.open();
    }

where the new SimpleDelayCompletionHandler(boolean) is always called with the value false and thus, these workers never report any completed PIs:

        // worker marking completion of process instance via "task-type-complete"
        registerWorker(taskType + "-completed");

        // worker marking completion of process instance via "task-type-complete"
        registerWorker(taskType + "-" + config.getStarterId() + "-completed");

Is this intentional? If so, the Grafana dashboard may need to be updated as multiple graphs are showing up with no data.

shahamit commented 1 year ago

Any luck on this @gabortega ? On the above code snippet for SimpleDelayCompletionHandler constructor, I in-fact tried passing the flag as true and running the benchmark tool but I don't see any change. The grafana dashboard is still empty and the completed jobs and processes counters are still 0.

Probably we are missing something very basic. I am surprised that the tool doesn't work out-of-the-box. I have just cloned this report and modified the application.properties to connect with the local zeebe cluster.

Thanks

gabortega commented 1 year ago

Hello @shahamit,

We eventually ended up making a fork of this tool and changed much of the code to fit our own needs.

I don't currently have access to the fix I did before our fork so this is based off what I remember:

For the original fix, I changed the signature of registerWorker(String jobType) to registerWorker(String jobType, boolean flag) and set the flag to true for those two workers and false to all the others.

We also observed that only the default backpressure strategy would produce the required metrics. Since we wanted to test Zeebe with a fixed throughput and have these metrics, we set all modifiers (i.e., benchmark.startPiReduceFactor and benchmark.startPiIncreaseFactor) to 0 and we did not have to change benchmark.maxBackpressurePercentage (I think). We could roughly set our desired throughput using benchmark.startPiPerSecond, though we didn't necessarily obtain the exact number set to the property.

shahamit commented 1 year ago

Thanks for your inputs @gabortega. While trying to troubleshoot if I check out to an old revision '08dc3ba3' I do see completed jobs counter incremented. Still no data in grafana. Probably the library upgrades done by the bot screwed up the application.

berndruecker commented 1 year ago

Sorry folks - no time too look into this right now - but happy to accept a PR if you find out the root case. Happy to be pinged again next month and hope to have more availability then :-|

falko commented 1 year ago

As a workaround you could look a the Zeebe Grafana Dashboard

shahamit commented 1 year ago

@falko - Zeebe grafana dashboard has a limitation - It cannot report cycle time for process instances that execute for more than 10 secs. The benchmarking tool dashboard can report it but it probably isn't compatible to report metrics for k8s deployments.

befer commented 1 year ago

I am adding this here, because the observation fits into the picture. Using the latest image on Kubernetes, I also found that the metrics _picycletime and _picompleted were never updated. This didn't change after building the image myself. Also the following lines in the pod's log were completely missing:

PI STARTED:     1022178 (+  1680) Last minute rate:  27.8
  Backpressure: 171815 (+   138) Last minute rate:   1.9. Percentage: 6.789 %
PI COMPLETED:   914193 (+  1150) Last minute rate:  20.0. Mean: 126,707. Percentile .95: 132,827. Percentile .99: 143,085

The reason is, that the StatisticsCollector (and probably other classes as well) are not properly initialized. During startup, there are a lot of messages like this:

17:46:41.441 [main] INFO  i.c.z.s.c.a.MicrometerMetricsRecorder - Enabling Micrometer based metrics for spring-zeebe (available via Actuator)
17:46:41.441 [main] INFO  o.s.c.s.PostProcessorRegistrationDelegate$BeanPostProcessorChecker - Bean 'micrometerMetricsRecorder' of type [io.camunda.zeebe.spring.client.actuator.MicrometerMetricsRecorder] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)

The StatisticsCollector was also amongst of them.

I understood that this is related to a cyclic dependency with @Autowire and class initialization during application startup, but for someone who is not into all this Spring and Spring-Boot stuff, the interdependencies are totally intransparent and I don't have a clue, how to fix it for me.

I eventually got the benchmark running by checking out a commit from 31 Mar 2022 (before all these spring stuff updates) and building the image from there. I'd really appreciate a fix for the latest version :-)