Closed tomaszdurka closed 8 years ago
I can always help!
As discussed this should add delay to failing jobs and double that delay every time job fails again (with maximum of 1 minute).
Well. It is already 60secs delay:
function JobProcessor(workingDirectory, concurrency, retryDelay) {
//...
this._retyDelay = retryDelay || 60000;
}
this._processor = new JobProcessor(tempFilesDir);
Seems delay doesn't work as expected. See mutliple jobs with the same job id logged at the same time: [Removed image]
UPDATE: Actually this was wrong assumption. There were just multiple jobs failing with the same jobData.id
which is stream.id (not the jobId - seems we have a bug which wipes jobId from janus-context https://github.com/cargomedia/cm-janus/issues/280).
retryDelay
is already high-60 seconds by default and introducing back-off would complicate logic too much.
Instead (as discussed with @vogdb) we decided to add configuration option retryDelay
for job processor.
Might seem that starting with delay of 5 minutes is a bit harsh, but we actually don't have cases where job fails and then succeeds, but currently only cases where it fails forever. Still let's keep the job failing as a reminder of the problem.
Then we can increase it where logging quota is a problem.
@tomaszdurka please review
Looks good, tests are failing.
@tomaszdurka please look
lgtm
Could you do this soon? Maybe Aleksei can help?
We're reaching capacity at Loggly again.