Closed abrom closed 5 years ago
Would probably switch back to use the built in create_index
helper and drop the text_pattern_ops
operator class to make it DB agnostic (doesn't need it given it's doing an exact match anyway)
@abrom Sorry for the extremely long-latency response!
I definitely agree that the LIKE
query is suboptimal, and I like the idea of adding a column to the delayed_jobs
table to uniquely identify the jobs. This behavior would have to be optional --- if the column exists, we should use it, and otherwise fall back on the LIKE
query.
I'm going to close this issue for now, but if you're interested in contributing a PR, I'd be very grateful. I think this should comprise:
recurring_job_descriptor
) and corresponding index to the delayed_jobs
tablejobs
method should use the value of the column rather than the current LIKE
queryLIKE
query matches on job class name and, optionally, a job_matching_param
. Adding this optimization shouldn't change the behavior, so we should store, and query on, this pair of values in recurring_job_descriptor
.Ideally, the migration would also populate recurring_job_descriptor
for existing delayed jobs (to avoid ending up with duplicates on the first run after the migration is run), but this is tricky; it would require not just scanning the entire delayed_jobs
table but deserializing each handler to determine if it's a RecurringJob and generate its recurring_job_descriptor
. So I would probably just include a warning in the README instructing users to manually unschedule any existing scheduled jobs before deploying this change.
No worries @afn
I suspect the better solution might be to have the optional handler name column implemented in delayed job, then DJR would just check for it's presence within the jobs method and switch as appropriate.
The only problem I see with that is that the DJ project appears to have gone stale :( No responses to issues since March.
I'll create an issue there and see if I get any bites. I think it would be useful to have this information in general. I also use the name/type column for analytic/queue management purposes (ie I care if X or Y job have been overdue for a period, but not Z).
Not sure if I'd class this as an issue or maybe instead something just to be documented, but here seems like a good a place as any to kick off a discussion. I had some issues last night where there were ~ 200k jobs in the queue and the recurring jobs started to die rather badly from some PG timeouts due to the fuzzy nature of the query in the
jobs
class method added in the RecurringJob module.Adding an index on the handler column is in reality not an option as the column is just too big. Not designed to be indexed, especially not in a way that would allow for a practical use of fuzzy queries (
LIKE
) etc.My solution was to add a
handler_type
column which contained the jobname
for ease of filter.Then added the following to an initialiser to handle new enqueued jobs as well as patching RecurringJob:
Solved my performance issues :) However it becomes more than just a plugin with this sort of change..
Feel free to close, or I'd be happy to put together some sort of formal description of the problem (and possible solution) to be included in the README?