collectiveidea / delayed_job

Database based asynchronous priority queue system -- Extracted from Shopify
http://groups.google.com/group/delayed_job
MIT License
4.82k stars 956 forks source link

Allow identifier with multiple workers #1191

Open toncid opened 1 year ago

toncid commented 1 year ago

Hello,

Using identifiers with multiple workers has been prevented for 14 years at least. Maybe it is time to reconsider it.

The PR #1190 loosens up that restriction and allows multiple identified job workers to be started with the following process name format:

delayed_job(.<identifier>)(.<worker_index>)

Both fields are optional, so the default process name remains unchanged (delayed_job).

albus522 commented 1 year ago

Why are you trying to do this?

toncid commented 1 year ago

Why are you trying to do this?

In order to better utilize a worker node, we would like to spawn as many workers as possible (until a CPU or memory limit is hit).

Yes, I know that we can call the delayed_job -i <identifierX> start command N times, but this is cleaner.

albus522 commented 1 year ago

Why are you specifying an identifier?

toncid commented 1 year ago

Why are you specifying an identifier?

Because we have multiple nodes running delayed_job workers, so we need job worker identifiers to avoid nodes stepping on each other's toes.

More context for using identifiers can be found e.g. in #866.

albus522 commented 1 year ago

You are referencing a very old issue that had more than an identifier at play. Delayed Job automatically handles multiple nodes just fine except in rare non-standard scenarios. Are you not able to give each node a unique hostname?

toncid commented 1 year ago

Thanks, that's a good question. I have checked in the DB and I see that jobs are locked as expected (with the identifier and node's hostname + PID):

locked_by: "delayed_job.1685897776 host:ip-172-28-22-203.eu-west-1.compute.internal pid:2618",

Ought to be unique enough.

We have implemented identifiers back in January 2018. One-off jobs were fired multiple times back then, so adding identifiers helped. I do not have much information on why the locking mechanism did not work well back then.

toncid commented 1 year ago

Setting my particular use case aside, what about the PR? Looks good?

I don't see much harm with enabling it, especially since a Bash workaround is supported anyway:

delayed_job -i 123456.0 start
delayed_job -i 123456.1 start
delayed_job -i 123456.2 start
delayed_job -i 123456.3 start