OCA / queue

Asynchronous Job Queue
GNU Affero General Public License v3.0
175 stars 451 forks source link

[14.0] queue_job: add mixin to link jobs #573

Closed simahawk closed 4 months ago

simahawk commented 10 months ago

You can easily link records to their related jobs

TODO

OCA-git-bot commented 10 months ago

Hi @guewen, some modules you are maintaining are being modified, check this out!

simahawk commented 10 months ago

decide if we keep it in queue_job or should be moved to an extension

@guewen WDYT?

guewen commented 10 months ago

Hey @simahawk,

You can easily link jobs to their related records.

This is actually doing the reverse: it links records to jobs. This is something I'd heavily discourage to do (and already did in addons reviews). Adding foreign keys on tables pointing to queue_job can quite hinder performance, especially if FK relations have no index.

A job may contain metadata pointing back to records, but I'd prefer to think the jobs being only "messages" like you'd have if jobs where in a Redis queue or else.

Now, if the need is about "recording" if a job has already been generated / cancelling it on actions etc., it can be handled using idempotent jobs (e.g. the job cancels itself if the action is no longer required). If the need is about UX, and you want to show jobs to users, maybe we should rather find a way to provide feedbacks (send job status on UI?) to users?

simahawk commented 9 months ago

@guewen thanks for your feedback. The main goal for me is improve UI. When you have tons of jobs for the same kind of records is really handy to be able to jump from those records to their jobs, no matter the state. Is nothing functional. Is really: "Show me all jobs that are directly linked to this record". We could also find them on the fly but since the metadata stored is not easily "selectable" I'm not sure we gain anything in terms of perf. Regarding this, I'd rather add indexes. However, the goal is not to display related jobs inside a view. Is to have an action to open a tree view on related jobs. Having a m2m that is loaded via action does not sound like something that kills perf. No?

guewen commented 9 months ago

I'm speaking of inserting and deleting rows in the queue_job table, which is the requirement number one and has to be as fast as possible. Operations on the table must be as fast as possible. Each index slows down operations, and conversely, FK from other tables referencing queue_job have checks to do (e.g. when you delete a job, it has to do a select on every referencing table to check cascading constraints).

simahawk commented 9 months ago

Oh, I see, you are right... Then I'll check how is possible to find them w/o touching their creation. Do you see any better option other than searching in the serialized field?

guewen commented 9 months ago

Another concern: it (m2o or m2m) is meant to be a link from other records to show jobs (other queues using redis or whatever do not have this and are fine without btw), but then people will start to use some logic such as checking if a job exists, cancel it etc. And then you start having jobs locked by other "long" transactions that took locks / updates them when a user clicks on a button. It's really an anti-pattern when dealing with jobs, and oh boy have I seen problems with that. They must be idempotent and fire and forget. This change would probably encourage bad usage of jobs.

simahawk commented 9 months ago

@guewen maybe it's time to convert the records field to jsonb so that we can query it easily?

They must be idempotent and fire and forget.

Yes... and no. 1st of all is nice to be able to jump to specific records' jobs from their form view. Really a time saver. 2nd but most important: sometimes you want to be able to pick the current job to check its state and do something based on this. Eg: do not schedule another job or chain a new job to the existing ones (eg: to avoid sending out new info before the previous one was sent). You could use a channel w/ capacity one but is not the same thing because it's also hard to scale. I hope is clear what I'm trying to address :)

guewen commented 9 months ago

2nd but most important: sometimes you want to be able to pick the current job to check its state and do something based on this. Eg: do not schedule another job or chain a new job to the existing ones (eg: to avoid sending out new info before the previous one was sent). You could use a channel w/ capacity one but is not the same thing because it's also hard to scale. I hope is clear what I'm trying to address

Well, the thing is that is generally considered a bad pattern in queues as you shouldn't rely on the state of others jobs. You should not need to care about 'not schedule another job' : you should enqueue it anyway and the job itself has to verify if it has to actually do something or not.

I get what you are trying to address but IMO (at least on the 2nd point), that should never be necessary if the jobs are well designed and themselves check the outside state (sometimes it may require adding a state/flag on the model on which work is done).

Same for this example:

eg: to avoid sending out new info before the previous one was sent

Store the fact that the info was sent (using steps, state machine, you name it) on the concerned record, but do not use the job's state / existence to track the current "state of the world".

A job is an asynchronous execution of a function, that's all there should be about it. If you were to use actual asynchronous python functions, you would make them store the state needed to be idempotent and be able to recover in case of reboot.

I'm convinced designing jobs this way makes their usage more robust and resilient. Also more understandable (idempotency handling always in the job, not in the callers).

guewen commented 9 months ago

As for using jsonb, yeah it makes sense in any case, but I would still discourage messing with the internals 😉

github-actions[bot] commented 5 months ago

There hasn't been any activity on this pull request in the past 4 months, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 30 days. If you want this PR to never become stale, please ask a PSC member to apply the "no stale" label.