[WIP] Spike delayed retry

jjbohn commented 8 years ago

Spike out a delayed retry.

Leverages x-death headers from dead letter exchanges to count retries. Does not require a special dead letter routing key.

This uses three exchanges (these can be shared by queues). And it requires two queues per primary queue (a retry and an error). The retry is a holding place that has it's message TTL'd and dead letters bak to the primary queue. When the max retries have been done, it publishes to the error exchange (with the routing key of the original message).

Would end up looking like this for us:

Exchanges

domain_events domain_events.retry domain_events.error

Queues

cp-dashboard.user_syncer cp-dashboard.user_syncer.retry cp-dashboard.user_syncer.error

So you can see we share exchanges, but each queue gets it's own retry and error queue.

Flow

delayed retry flow

Dead letter flow

dead_letters_-_google_drawings

Exponential backoff

We can accomplish this with a similar setup but more queues

jjbohn commented 8 years ago

@brianstorti Here's a spike of a delayed retry mechanism. It's basically what you've suggested before. Need to test it though. The timing thing is going to be a bit funky to test, but wanted to get your take on it before I spent a ton of time on it.

toreriklinnerud commented 8 years ago

@jjbohn thoughts on having a queue created automatically for each exponential backoff level, but having them expire and disappear once they no longer contain any messages (think this can be done by setting the TTL on the queue itself ( x-expires) to something slightly higher than the x-message-ttl on the queue, that way you would end up with

cp-dashboard.user_syncer.retry.8s cp-dashboard.user_syncer.retry.16s cp-dashboard.user_syncer.retry.32s cp-dashboard.user_syncer.retry.1m4s etc... looks messy, but they would disappear again once they no longer contain any messages.

jjbohn commented 8 years ago

Had the same thought initially but queue expiry deletes the queue thus making it so there's nothing listening to the routing key so that wouldn't work. On Tue, Apr 19, 2016 at 5:50 PM Tor Erik Linnerud notifications@github.com wrote:

@jjbohn https://github.com/jjbohn thoughts on having a queue created automatically for each exponential backoff level, but having them expire and disappear once they no longer contain any messages (think this can be done by setting the TTL on the queue itself ( x-expires) to something slightly higher than the x-message-ttl on the queue, that way you would end up with

cp-dashboard.user_syncer.retry.8s cp-dashboard.user_syncer.retry.16s cp-dashboard.user_syncer.retry.32s cp-dashboard.user_syncer.retry.1m4s etc... looks messy, but they would disappear again once they no longer contain any messages.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/alphasights/sneakers_handlers/pull/8#issuecomment-212143747

brianstorti commented 8 years ago

Cool. I just think that, if we are going to use multiple queues for exponential backoff, we can avoid the retry exchange, can't we?

Something similar to what was done here, but having a final resting queue (i.e. a queue with no x-message-ttl set). It'd work like this:

(Using as an example a setup with a short and long queue, but it's up to us to define how many queues we want to have, we can have queue.10s, queue.30s, queue.10m, etc.).

|        Queue         |     Exchange      |     TTL     |
|----------------------|-------------------|-------------|
| cp.user_syncer       | domain_events     | Not defined |
| cp.user_syncer.short | domain_events.dlx | 5000        |
| cp.user_syncer.long  | domain_events.dlx | 30000       |
| cp.user_syncer.dlx   | domain_events.dlx | Not defined |

Message x published to cp.user_syncer;
For the first 10 failures the message is published to cp.user_syncer.short;
For the next 30 failures the message is published to cp.user_syncer.long;
After that, message is published to cp.user_syncer.dlx, where it needs to be handled manually.

jjbohn commented 8 years ago

Yeah, that makes sense. I'll rig that up.

On Wed, Apr 20, 2016 at 5:19 AM Brian Storti notifications@github.com wrote:

Cool. I just think that, if we are going to use multiple queues for exponential backoff, we can avoid the retry exchange, can't we?

Something similar to what was done here https://github.com/alphasights/cp-dashboard/pull/210, but having a final resting queue (i.e. a queue with no x-message-ttl set). It'd work like this:

(Using as an example a setup with a short and long queue, but it's up to us to define how many queues we want to have, we can have queue.10s, queue.30s, queue.10m, etc.).

Queue Exchange TTL

cp.user_syncer domain_events Not defined

cp.user_syncer.short domain_events.dlx 5000

cp.user_syncer.long domain_events.dlx 30000

cp.user_syncer.dlx domain_events.dlx Not defined

Message x published to cp.user_syncer;

For the first 10 failures the message is published to cp.user_syncer.short;

For the next 30 failures the message is published to cp.user_syncer.long;

After that, message is published to cp.user_syncer.dlx, where it needs to be handled manually.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/alphasights/sneakers_handlers/pull/8#issuecomment-212343260

Queue	Exchange	TTL
cp.user_syncer	domain_events	Not defined
cp.user_syncer.short	domain_events.dlx	5000
cp.user_syncer.long	domain_events.dlx	30000
cp.user_syncer.dlx	domain_events.dlx	Not defined

jjbohn commented 8 years ago

Ooo, you know @toreriklinnerud, just had a thought on what I could do for the "queue cleanup". Will add it to this spike today.

alphasights / sneakers_handlers