I have a job that ran for some 3 months at the correctly allocated time
The job then ran long, so long it likely overlapped itself and dead-locked itself running a database process
The issue was observed and the affected processes killed
While chasing a request for data this AM we reviewed the schedule definition and found that the start time had moved
Reviewing the log that we develop for the process this new scheduled time is the same hour & minute as when the job was killed
I'm going to surmise that the the calculation that is performed to determine the next run time threw a glitch because the very long process time - in excess of 24 hours, actually potentially days before the solution caught up
Obviously this is an extreme example but hopefully this information is of use
I have a job that ran for some 3 months at the correctly allocated time The job then ran long, so long it likely overlapped itself and dead-locked itself running a database process The issue was observed and the affected processes killed While chasing a request for data this AM we reviewed the schedule definition and found that the start time had moved Reviewing the log that we develop for the process this new scheduled time is the same hour & minute as when the job was killed
I'm going to surmise that the the calculation that is performed to determine the next run time threw a glitch because the very long process time - in excess of 24 hours, actually potentially days before the solution caught up
Obviously this is an extreme example but hopefully this information is of use
runtime stats - observe the time change
{ "job_id": "85a9ad27-c179-42df-4c0d-2bc16012602e", "ran_at": "2017-01-29T23:30:00.002918331Z", "number_of_retries": 0, "success": true, "execution_duration": 169104023611 }, { "job_id": "85a9ad27-c179-42df-4c0d-2bc16012602e", "ran_at": "2017-01-31T23:30:00.006173233Z", "number_of_retries": 0, "success": true, "execution_duration": 141530815675788 }, { "job_id": "85a9ad27-c179-42df-4c0d-2bc16012602e", "ran_at": "2017-02-02T14:48:50.821937456Z", "number_of_retries": 0, "success": false, "execution_duration": 2155156376784 }, { "job_id": "85a9ad27-c179-42df-4c0d-2bc16012602e", "ran_at": "2017-02-03T14:48:50.822134093Z", "number_of_retries": 0, "success": true, "execution_duration": 6163850644698 }, { "job_id": "85a9ad27-c179-42df-4c0d-2bc16012602e", "ran_at": "2017-02-04T14:48:50.822309419Z", "number_of_retries": 0, "success": true, "execution_duration": 1148506647346 }, { "job_id": "85a9ad27-c179-42df-4c0d-2bc16012602e", "ran_at": "2017-02-05T14:48:50.822477233Z", "number_of_retries": 0, "success": true, "execution_duration": 1063678685264 },
application log showing the deadlocks and reschedule glitch!
2017/01/31 23:30:00 /home/****/gocode/src/gitlab.****.com/****/****-aws-post/dd_post_process --module=OFFERAUTO >> /h$ 2017/01/31 23:30:00 dd_post_process.go:62: Promotion Module ..........: OFFERAUTO 2017/01/31 23:30:00 dbutil.go:1997: Active Functions Found ....: 1 2017/01/31 23:30:00 dbutil.go:2018: Function ..................: offers.offer_result_dyn_auto() 2017/02/02 14:48:50 dd_post_process.go:208: Fatal error, abending: ERROR: canceling statement due to user request (SQLSTATE 57014) Pinging the bosun server, http://spider.****.com:8070 at 2017-02-02 14:48:50.531523315 +0000 UTC with {"metric":"job.exit_status","timesta$ 2017/02/02 14:48:50 /home/****/gocode/src/gitlab.****.com/****/****-aws-post/dd_post_process --module=OFFERAUTO >> /h$ 2017/02/02 14:48:50 dd_post_process.go:62: Promotion Module ..........: OFFERAUTO 2017/02/02 14:48:50 dbutil.go:1997: Active Functions Found ....: 1 2017/02/02 14:48:50 dbutil.go:2018: Function ..................: offers.offer_result_dyn_auto() 2017/02/03 14:48:51 /home/****/gocode/src/gitlab.****.com/****/****-aws-post/dd_post_process --module=OFFERAUTO >> /ho$ 2017/02/03 14:48:51 dd_post_process.go:62: Promotion Module ..........: OFFERAUTO 2017/02/03 14:48:51 dbutil.go:1997: Active Functions Found ....: 1 2017/02/03 14:48:51 dbutil.go:2018: Function ..................: offers.offer_result_dyn_auto() 2017/02/03 16:31:34 dd_post_process.go:208: Fatal error, abending: ERROR: canceling statement due to user request (SQLSTATE 57014) Pinging the bosun server, http://spider.****.com:8070 at 2017-02-03 16:31:34.594741463 +0000 UTC with {"metric":"job.exit_status","timesta$ 2017/02/04 14:48:50 /home/****/gocode/src/gitlab.****.com/****/****-aws-post/dd_post_process --module=OFFERAUTO >> /ho$ 2017/02/04 14:48:50 dd_post_process.go:62: Promotion Module ..........: OFFERAUTO 2017/02/04 14:48:50 dbutil.go:1997: Active Functions Found ....: 1 2017/02/04 14:48:50 dbutil.go:2018: Function ..................: offers.offer_result_dyn_auto() 2017/02/04 15:07:59 dd_post_process.go:201: Done. 19m8.421136637s elapsed Pinging the bosun server, http://spider.****.com:8070 at 2017-02-04 15:07:59.263383559 +0000 UTC with {"metric":"job.exit_status","timesta$