camunda / camunda-bpm-platform

Flexible framework for workflow and decision automation with BPMN and DMN. Integration with Quarkus, Spring, Spring Boot, CDI.
https://camunda.com/
Apache License 2.0
4.03k stars 1.53k forks source link

Improve job acquisition job execution logging #2666

Open ThorbenLindhauer opened 2 years ago

ThorbenLindhauer commented 2 years ago

This issue was imported from JIRA:

Field Value
JIRA Link CAM-14334
Reporter @daniel-ewing
Has restricted visibility comments true

Environment (Required on creation):

Camunda Platform, 2 or more node cluster.

Description (Required on creation; please attach any relevant screenshots, stacktraces, log files, etc. to the ticket):

As described in SUPPORT-12684, with:

It is possible for the nodes to become locked in a cycle of constantly processing the same jobs, getting OLEs, and produce debug logging like:

2022-01-28 11:58:59.849 DEBUG 1188403 --- <JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor>] org.camunda.bpm.engine.jobexecutor       : ENGINE-14012 Job acquisition thread woke up
2022-01-28 11:58:59.849 DEBUG 1188403 --- <JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor>] org.camunda.bpm.engine.jobexecutor       : ENGINE-14022 Acquired 0 jobs for process engine 'default': <]
2022-01-28 11:58:59.849 DEBUG 1188403 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor>] org.camunda.bpm.engine.jobexecutor       : ENGINE-14023 Execute jobs for process engine 'default': <32480db8-8029-11ec-acaf-005056bf79af>
2022-01-28 11:58:59.850 DEBUG 1188403 --- <JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor>] org.camunda.bpm.engine.jobexecutor       : ENGINE-14023 Execute jobs for process engine 'default': <334d21ec-8029-11ec-acaf-005056bf79af>
2022-01-28 11:58:59.850 DEBUG 1188403 --- <JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor>] org.camunda.bpm.engine.jobexecutor       : ENGINE-14023 Execute jobs for process engine 'default': <359d2bb0-8029-11ec-acaf-005056bf79af>
2022-01-28 11:58:59.850 DEBUG 1188403 --- <JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor>] org.camunda.bpm.engine.jobexecutor       : ENGINE-14011 Job acquisition thread sleeping for 99 millis
2022-01-28 11:58:59.949 DEBUG 1188403 --- <JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor>] org.camunda.bpm.engine.jobexecutor       : ENGINE-14012 Job acquisition thread woke up
2022-01-28 11:58:59.949 DEBUG 1188403 --- <JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor>] org.camunda.bpm.engine.jobexecutor       : ENGINE-14022 Acquired 0 jobs for process engine 'default': <]
2022-01-28 11:58:59.949 DEBUG 1188403 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor>] org.camunda.bpm.engine.jobexecutor       : ENGINE-14023 Execute jobs for process engine 'default': <32480db8-8029-11ec-acaf-005056bf79af>
2022-01-28 11:58:59.949 DEBUG 1188403 --- <JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor>] org.camunda.bpm.engine.jobexecutor       : ENGINE-14023 Execute jobs for process engine 'default': <334d21ec-8029-11ec-acaf-005056bf79af>
2022-01-28 11:58:59.949 DEBUG 1188403 --- <JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor>] org.camunda.bpm.engine.jobexecutor       : ENGINE-14023 Execute jobs for process engine 'default': <359d2bb0-8029-11ec-acaf-005056bf79af>
2022-01-28 11:58:59.949 DEBUG 1188403 --- <JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor>] org.camunda.bpm.engine.jobexecutor       : ENGINE-14011 Job acquisition thread sleeping for 100 millis
2022-01-28 11:59:00.049 DEBUG 1188403 --- <JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor>] org.camunda.bpm.engine.jobexecutor       : ENGINE-14012 Job acquisition thread woke up
2022-01-28 11:59:00.049 DEBUG 1188403 --- <JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor>] org.camunda.bpm.engine.jobexecutor       : ENGINE-14022 Acquired 0 jobs for process engine 'default': <]
2022-01-28 11:59:00.049 DEBUG 1188403 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor>] org.camunda.bpm.engine.jobexecutor       : ENGINE-14023 Execute jobs for process engine 'default': <32480db8-8029-11ec-acaf-005056bf79af>
2022-01-28 11:59:00.050 DEBUG 1188403 --- <JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor>] org.camunda.bpm.engine.jobexecutor       : ENGINE-14023 Execute jobs for process engine 'default': <334d21ec-8029-11ec-acaf-005056bf79af>
2022-01-28 11:59:00.050 DEBUG 1188403 --- <JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor>] org.camunda.bpm.engine.jobexecutor       : ENGINE-14023 Execute jobs for process engine 'default': <359d2bb0-8029-11ec-acaf-005056bf79af>
2022-01-28 11:59:00.050 DEBUG 1188403 --- <JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor>] org.camunda.bpm.engine.jobexecutor       : ENGINE-14011 Job acquisition thread sleeping for 99 millis

Steps to reproduce (Required on creation):

Analysis from this comment regarding the scenario that causes the above logging:

Observed Behavior (Required on creation):

The DEBUG logs are unhelpful and a bit confusing:

If 0 jobs were acquired, how were there 3 jobs to execute? Where are the OLE exception errors?

Expected behavior (Required on creation):

Helpful DEBUG logs, including:

Root Cause (Required on prioritization):

Solution Ideas (Optional):

Hints (optional):

Links:

### Pull requests
- [ ] https://github.com/camunda/camunda-bpm-platform/pull/4217
prakashpalanisamy commented 5 months ago

I would like to pick this one to start contributing. Please let me know if that's okay..

yanavasileva commented 5 months ago

Hi @prakashpalanisamy,

Great to hear you would like to make a contribution! We are always open to assist you to do so. Are you interested in this particular problem to resolve? Or you were exploring a good first issue to start? Nevertheless, check our CONTRIBUTING guide as a starting and let me know if you have any questions. Don't forget to link the newly create pull request to the issue that you are working on.

Best, Yana

prakashpalanisamy commented 5 months ago

Hi @yanavasileva ,

Thank you for responding. I found this issue interesting when browsing through the open issues. Going through the guidelines for contribution. Will reachout in case of questions. Thank you!!

prakashpalanisamy commented 4 months ago

Hi @yanavasileva, Hope you are doing good. I was able to setup local camunda clusters (Camunda 7.20) pointing to single postgres DB Instance and was able to reproduce the Issue. Was using a model with Async continuation task calling a delegate which takes more time to complete the task compared to the value in property lock-time-in-millis. Im working with a single process instance of the model and hence a single Job to be executed.

I was able to observe the below from the logs:

The difference that I observed compared to the above logs in the description was, I see that 1 Job was acquired, as opposed to the above, where it states 0 Acquired jobs followed by Execute Job logs. Since that (number of Job acquired part) conflicts with the Observed Behaviour stated above, wanted to check with you.

Attaching my Log snippets below for the reference.


2024-02-21T18:02:10.602+05:30 DEBUG 18768 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.jobexecutor       : ENGINE-14022 Acquired 1 jobs for process engine 'default': [[3bc2bd0b-d0b5-11ee-ab52-bc03580de904]]
2024-02-21T18:02:10.602+05:30 DEBUG 18768 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.jobexecutor       : ENGINE-14023 Execute jobs for process engine 'default': [3bc2bd0b-d0b5-11ee-ab52-bc03580de904]
2024-02-21T18:02:10.603+05:30 DEBUG 18768 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.cmd               : ENGINE-13005 Starting command -------------------- AcquireJobsCmd ----------------------
2024-02-21T18:02:10.604+05:30 DEBUG 18768 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.cmd               : ENGINE-13009 opening new command context
2024-02-21T18:02:10.649+05:30  INFO 18768 --- [camundaTaskExecutor-1] com.test.delegates.JeDelegate         : Starting the JE Delegate... by camundaTaskExecutor-1
2024-02-21T18:02:15.610+05:30 DEBUG 18768 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.cmd               : ENGINE-13006 Finishing command -------------------- AcquireJobsCmd ----------------------
2024-02-21T18:02:15.610+05:30 DEBUG 18768 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.jobexecutor       : ENGINE-14022 Acquired 0 jobs for process engine 'default': []
2024-02-21T18:02:15.610+05:30 DEBUG 18768 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.jobexecutor       : ENGINE-14011 Job acquisition thread sleeping for 9994 millis
2024-02-21T18:02:25.605+05:30 DEBUG 18768 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.jobexecutor       : ENGINE-14012 Job acquisition thread woke up
2024-02-21T18:02:25.605+05:30 DEBUG 18768 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.cmd               : ENGINE-13005 Starting command -------------------- AcquireJobsCmd ----------------------
2024-02-21T18:02:25.608+05:30 DEBUG 18768 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.cmd               : ENGINE-13009 opening new command context
2024-02-21T18:02:25.613+05:30 DEBUG 18768 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.cmd               : ENGINE-13006 Finishing command -------------------- AcquireJobsCmd ----------------------
2024-02-21T18:02:25.613+05:30 DEBUG 18768 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.jobexecutor       : ENGINE-14022 Acquired 0 jobs for process engine 'default': []
2024-02-21T18:02:25.613+05:30 DEBUG 18768 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.jobexecutor       : ENGINE-14011 Job acquisition thread sleeping for 19992 millis
2024-02-21T18:02:25.681+05:30 DEBUG 9060 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.jobexecutor       : ENGINE-14012 Job acquisition thread woke up
2024-02-21T18:02:25.682+05:30 DEBUG 9060 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.cmd               : ENGINE-13005 Starting command -------------------- AcquireJobsCmd ----------------------
2024-02-21T18:02:25.684+05:30 DEBUG 9060 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.cmd               : ENGINE-13009 opening new command context
2024-02-21T18:02:25.685+05:30 DEBUG 9060 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] o.c.b.e.i.p.e.J.selectNextJobsToExecute  : ==>  Preparing: select RES.ID_, RES.REV_, RES.DUEDATE_, RES.PROCESS_INSTANCE_ID_, RES.EXCLUSIVE_ from ACT_RU_JOB RES where (RES.RETRIES_ > 0) and ( RES.DUEDATE_ is null or RES.DUEDATE_ <= ? ) and (RES.LOCK_OWNER_ is null or RES.LOCK_EXP_TIME_ < ?) and RES.SUSPENSION_STATE_ = 1 and ( ( RES.EXCLUSIVE_ = true and not exists( select J2.ID_ from ACT_RU_JOB J2 where J2.PROCESS_INSTANCE_ID_ = RES.PROCESS_INSTANCE_ID_ -- from the same proc. inst. and (J2.EXCLUSIVE_ = true) -- also exclusive and (J2.LOCK_OWNER_ is not null and J2.LOCK_EXP_TIME_ >= ?) -- in progress ) ) or RES.EXCLUSIVE_ = false ) LIMIT ? OFFSET ?
2024-02-21T18:02:25.685+05:30 DEBUG 9060 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] o.c.b.e.i.p.e.J.selectNextJobsToExecute  : ==> Parameters: 2024-02-21 18:02:25.685(Timestamp), 2024-02-21 18:02:25.685(Timestamp), 2024-02-21 18:02:25.685(Timestamp), 3(Integer), 0(Integer)
2024-02-21T18:02:25.687+05:30 DEBUG 9060 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] o.c.b.e.i.p.e.J.selectNextJobsToExecute  : <==      Total: 0
2024-02-21T18:02:25.687+05:30 DEBUG 9060 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.cmd               : ENGINE-13011 closing existing command context
2024-02-21T18:02:25.687+05:30 DEBUG 9060 --- [JobExecutor[org.camunda.bpm.engine.spring.components.jobexecutor.SpringJobExecutor]] org.camunda.bpm.engine.persistence       : ENGINE-03006 Cache state after flush: [
]  

2024-02-21T18:13:25.690+05:30 DEBUG 18768 --- [camundaTaskExecutor-2] org.camunda.bpm.engine.context           : ENGINE-16002 Exception while closing command context: ENGINE-03005 Execution of 'DELETE MessageEntity[3bc2bd0b-d0b5-11ee-ab52-bc03580de904]' failed. Entity was updated by another transaction concurrently.

org.camunda.bpm.engine.OptimisticLockingException: ENGINE-03005 Execution of 'DELETE MessageEntity[3bc2bd0b-d0b5-11ee-ab52-bc03580de904]' failed. Entity was updated by another transaction concurrently.
    at org.camunda.bpm.engine.impl.db.EnginePersistenceLogger.concurrentUpdateDbEntityException(EnginePersistenceLogger.java:139) ~[camunda-engine-7.20.0.jar:7.20.0]
    at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.handleConcurrentModification(DbEntityManager.java:412) ~[camunda-engine-7.20.0.jar:7.20.0]
    at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.flushDbOperations(DbEntityManager.java:355) ~[camunda-engine-7.20.0.jar:7.20.0]
    at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.flushDbOperationManager(DbEntityManager.java:323) ~[camunda-engine-7.20.0.jar:7.20.0]
    at org.camunda.bpm.engine.impl.db.entitymanager.DbEntityManager.flush(DbEntityManager.java:295) ~[camunda-engine-7.20.0.jar:7.20.0]
    at org.camunda.bpm.engine.impl.interceptor.CommandContext.flushSessions(CommandContext.java:272) ~[camunda-engine-7.20.0.jar:7.20.0]
    at org.camunda.bpm.engine.impl.interceptor.CommandContext.close(CommandContext.java:188) ~[camunda-engine-7.20.0.jar:7.20.0]
    at org.camunda.bpm.engine.impl.interceptor.CommandContextInterceptor.execute(CommandContextInterceptor.java:119) ~[camunda-engine-7.20.0.jar:7.20.0]
    at org.camunda.bpm.engine.spring.SpringTransactionInterceptor.lambda$execute$0(SpringTransactionInterceptor.java:71) ~[camunda-engine-spring-6-7.20.0.jar!/:7.20.0]
    at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140) ~[spring-tx-6.0.10.jar!/:6.0.10]
    at org.camunda.bpm.engine.spring.SpringTransactionInterceptor.execute(SpringTransactionInterceptor.java:71) ~[camunda-engine-spring-6-7.20.0.jar!/:7.20.0]
    at org.camunda.bpm.engine.impl.interceptor.ProcessApplicationContextInterceptor.execute(ProcessApplicationContextInterceptor.java:70) ~[camunda-engine-7.20.0.jar:7.20.0]
    at org.camunda.bpm.engine.impl.interceptor.CommandCounterInterceptor.execute(CommandCounterInterceptor.java:35) ~[camunda-engine-7.20.0.jar:7.20.0]
    at org.camunda.bpm.engine.impl.interceptor.LogInterceptor.execute(LogInterceptor.java:33) ~[camunda-engine-7.20.0.jar:7.20.0]
    at org.camunda.bpm.engine.impl.interceptor.ExceptionCodeInterceptor.execute(ExceptionCodeInterceptor.java:55) ~[camunda-engine-7.20.0.jar:7.20.0]
    at org.camunda.bpm.engine.impl.jobexecutor.ExecuteJobHelper.executeJob(ExecuteJobHelper.java:57) ~[camunda-engine-7.20.0.jar:7.20.0]
    at org.camunda.bpm.engine.impl.jobexecutor.ExecuteJobsRunnable.executeJob(ExecuteJobsRunnable.java:110) ~[camunda-engine-7.20.0.jar:7.20.0]
    at org.camunda.bpm.engine.impl.jobexecutor.ExecuteJobsRunnable.run(ExecuteJobsRunnable.java:71) ~[camunda-engine-7.20.0.jar:7.20.0]
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
    at java.base/java.lang.Thread.run(Thread.java:840) ~[na:na]```
yanavasileva commented 4 months ago

Hi @prakashpalanisamy,

Sounds great that you're getting started!

The difference that I observed compared to the above logs in the description was, I see that 1 Job was acquired, as opposed to the above, where it states 0 Acquired jobs followed by Execute Job logs.

You do have the Acquired 0 jobs occurrences in the output after the first acquisition. The snippet in the description just doesn't contain the very first acquisitions, therefore only the following 0 jobs acquired is there. In other words, that's expected and no need to worry, you can continue with next steps.

Best, Yana

prakashpalanisamy commented 3 months ago

@yanavasileva : I have created the below PR for the above Issue. Requesting you to please take a look. Thanks!

https://github.com/camunda/camunda-bpm-platform/pull/4217