[WIP] Feature/job chaining

basz commented 10 years ago

Sometimes it is important that one job is executed after an other job has completed. This can be hard to guarantee when more then one worker is handling a queue. This POC PR introduces a simple way to ensure such a requirement.

AbstractJob::chainJob($job);

I call this for now a 'chain'. Hence chainJob. Perhaps appendJob would be better.

$job = $jobPluginManager->get('MyLongRunningJob');
$chainedJob = $jobPluginManager->get('DoSomethingWithResultJob');
$job->chainJob($job);
$queue->push($job);

These jobs are then added to an array stored as private metadata on the job itself. And thus they become part of the payload.

Upon job completion any jobs stored in the chain are added to the queue in the same order as they were added (array_shift operation).

Why wouldn't I just add DoSomethingWithResultJob from within the MyLongRunningJob you say? Well that requires me to add any content for the DoSomethingWithResultJob to the MyLongRunningJob. But perhaps more importantly a hardcoded dependency now exists between two (and possibly more) functionally unrelated jobs.

further ideas;

specify an options indicating when the chained job should be added (when JOB_STATUS_SUCCESS, JOB_STATUS_FAILURE, or BOTH)
specify the queueName to add chained jobs (processing and then emailing might be handled by different queues)
adding chained jobs should probably be handled by a strategy other then the ProcessQueueStrategy

complications;

A job might fail. Especially with an exception. I havent given much thought too buried jobs. For now I just ignore those. If you successfully recover a job any chained ones will be added upon completion.

Curious to your thoughts...

basz commented 9 years ago

hio @bakura10 @eddiejaoude @juriansluiman I would like to get this feature going. Do you have any thoughts on such an implementation? For or against?

Again main motivation is that it makes it simpler to guarantee job's execution order without adding this responsibility to an actual job... you know $this->getQueue()->push(new JobB()) inside a JobA.

bakura10 commented 9 years ago

I'm not a big fan of the idea tbh. I can understand the rationale, especially for queue providers like SQS that do not guarantee FIFO order.

Now, this encourage having more logic into the job. I'm now used to have very lightweight jobs. My jobs basically have a dependency to one or multiple services, and call them to execute the hard work. This encourage better practices of keeping the logic inside the services, and remove the problem you are introducing about errors (if one job in the chain fails, then the whole job is reinserted again).

I actually have this use case in one of my job (one job generating multiple sub-jobs), and I do the insert right into the job. After all, if you pack many job exeuctions into a single job, you lose the advantage of scalability, as jobs can be run in parallel.

I'll wait for others but I'm not a big fan for now.

basz commented 9 years ago

After all, if you pack many job exeuctions into a single job, you lose the advantage of scalability, as jobs can be run in parallel.

My problem is exactly this. I can't run multiple workers processing the same queue - because i have no guarantee that Job A has been run before Job B commences. And I don't want to complicate Job A with knowledge about Job B, C, D, etc...

Concrete use case: I have a doctrine listener that queue's several jobs when some record changes 'state'. It needs to process some uploaded JPG (job) and then generate a PDF (job) and then (and only then) copy some files (inc. the PDF and JPG) to a Dropbox FS (job).

Well curious about other's opinions too...

bakura10 commented 9 years ago

In some way the job has knowledge about other jobs as it is in the metadata of the JOB. So in one case you have an explicit dependency right into the job, in the other case it's hidden in the metadata (and hard to debug). Can't you use listeners actually? Your job A triggers an event, where this event insert job B, then job B execute, triggers an event and job C is inserted. Therefore, you guarantee the order and have no dependency.

Obviously, the lack of lazy listeners in ZF2 makes this quite inefficient if you have many such listeners, but this sounds like a better solution if you absolutely want avoid coupling. I'm not sure your approach of chaining is really testable too.

eddiejaoude commented 9 years ago

Concrete use case: I have a doctrine listener that queue's several jobs when some record changes 'state'. It needs to process some uploaded JPG (job) and then generate a PDF (job) and then (and only then) copy some files (inc. the PDF and JPG) to a Dropbox FS (job).

That is a very common use case (well something similar). The way we implemented it previously was when the Jon successfully finished, it triggered an Event which created the next Job.

eg.

Code -> Job A -> Event -> Job B -> Event -> Job C

So Job A was not directly aware of Job B and so on. We didn't notice any performance issues as this was happening Async in the background, but yes events in ZF2 usually have a performance hit.

basz commented 9 years ago

Yes, these jobs would store knowledge of chained jobs, but only as metadata. So invisible -sort of- to the job itself.

Working via events does not improve this very much as subsequent jobs will/might need information only known when pushed onto the queue (such as the identity of the user doing the push). Such information still needs to be stored in the primary job to be able to retrieve it in an event listener.

events in ZF2 usually have a performance hit

Not worried about performance so much...

I experimented with a special job (that has a chainJob method and basically does what i implemented in this WIP, reinserting itself until all chained jobs are done). Unfortunatly I have no way of retrieving the status of a job. peek (SlmQueueDoctrine specific) returns a job instance, no information about its status.

I'm not sure your approach of chaining is really testable too.

A test is written - or do you mean functional tests?

hmm, tricky...

basz commented 9 years ago

Well, since i needed something i have made it into a module... http://github.com/bushbaby/BsbPostponableJobStrategy

Webador / SlmQueue

[WIP] Feature/job chaining #140