agibralter / ts-resque-delta

ThinkingSphinx + Resque delta indexing.
MIT License
69 stars 36 forks source link

Don't enqueue duplicate jobs. #11

Closed ryansch closed 12 years ago

ryansch commented 12 years ago

This leverages a fork of resque-loner to prevent duplicate jobs from making it into the queue.

I had to modify resque-loner to bring it up to date. Hopefully my changes will be accepted upstream and we can drop the use of my fork. I'm going to be testing this in production over the next few days to make sure there are no performance issues.

ryansch commented 12 years ago

It occurs to me that my approach might not work for the delta indexes. I'll investigate more tomorrow.

agibralter commented 12 years ago

I'm just curious if you've run into any issues with ts-resque-delta that caused you to need this patch. Right now ts-resque-delta uses resque-lock-timeout to ensure that no two jobs run at once, and it uses a hook to clear the queue of duplicate jobs.

ryansch commented 12 years ago

I'm actually seeing quite a few duplicate flag-as-deleted jobs. I'm also working on a patch that will allow the flag as deleted jobs to be worked off more quickly. Our search indexing can't keep up right now.

I may just drop the duplicate stuff from the delta job.

agibralter commented 12 years ago

Hmm.. Since flag as deleted jobs take record ids, each individual job does have to run. Do you mean there are too many identical flag as deleted jobs for the same record? I guess in theory, only one flag as deleted job should ever have to run for each record in between runnings of the full indexer...

@freelancing-god -- sorry to bug you, but does my logic there seem correct?

ryansch commented 12 years ago

Yeah I'm seeing flag as deleted jobs with the same document id for the same index(es) in my queue without this patch. I have found a bug in loner which I'll have to fix before we can do anything useful with this.

ryansch commented 12 years ago

I suppose I could also try to figure out why I'm getting duplicate flag as deleted jobs. ;-)

agibralter commented 12 years ago

Hmm I'm trying to figure out what could be going one. It seems like neither job should be enqueued if the record has already been toggled as delta = true...

https://github.com/agibralter/ts-resque-delta/blob/master/lib/thinking_sphinx/deltas/resque_delta.rb#L62 https://github.com/freelancing-god/thinking-sphinx/blob/master/lib/thinking_sphinx/deltas/default_delta.rb#L14

ryansch commented 12 years ago

We can take this discussion private if you'd like to see my config.

ryansch commented 12 years ago

I've come up with an alternate approach that I'm working on now. Closing.

agibralter commented 12 years ago

Ah, awesome -- would love to see what you came up with!