winnow + vip enhancement

reidsunderland commented 10 months ago

When running a winnow on a cluster, you have to use a VIP to ensure that duplicates get suppressed.

If you don't use a VIP, each node will have different nodupe caches and duplicates could get through if the first fileA message goes to node1 and the second (duplicate) fileA message goes to node2.

Currently on v2 and sr3, the node with the VIP subscribes and posts messages and the other nodes do nothing. If the VIP switches to a different node, the nodupe cache on that node will be empty and there's a chance duplicate messages could be posted.

An improvement would be to make this work more like poll does now.

All nodes would need to use a unique queue subscribed to the source exchange.

Node with the VIP:

Subscribes to the source exchange
Populates nodupe cache
Posts non-duplicate messages to the post_exchange

Nodes without the VIP:

Subscribes to the source exchange
Populates nodupe cache
Do not post anything to the post_exchange

petersilva commented 10 months ago

This is the same as v2 behaviour... not a regression but a opportunity for improvement.

petersilva commented 10 months ago

Current method sends duplicates when the vip changes owners. Idea/goal is to fix that.

minimize the number of duplicates notifications sent. (ideally 0)
minimize the number of unique notifications not sent. (ideally 0)

vip voting scheme can put significant time periods (minutes) where the vip is either in the wrong place, or no-place while a transfer is in progress.

Method 1: separate queues, vip gates posting.

Same as @reidsunderland 's post above.
pick different queues for every participating instance in a vip winnow.
have it consume from the queue all the time, but not having the vip prevents it from posting.

SWOT:

S: simple works exactly like a normal subscriber, easy to understand, vip only controls publish.
W: when things are bad, you may lose messages that the healthy node processes before the sick node loses the vip and hands it over to the healthy node.

Method 2: poll style.

use a common queue for all members as today... but have a second queue bound to the output exchange (like the poll.)
when you don't have the vip, consume from the output of the winnow... so you know what was posted, and piopulate duplicate suppression cache.
when you have the vip consume from the input queue instead of the output queue.

SWOT:

S: while vip is in motion, the unpublished data stays in a shared upstream queue.
W: if the one with the vip consumes messages but doesn't publish them, they are still gone (as will happen when dying.) should be much fewer than previous case?
W: convoluted flow... a queue you only consume from when you have the vip? a separate queue you only consume from when you don't have the vip?

observations...

As soon as you have a vip in the story... not clear whether the failover can be perfect. you either send a few duplicates, or drop some messages. For our use case, I think duplicates are less of a problem than dropping. (this falls from CAP theorem, and voting algorithms.)

petersilva commented 10 months ago

Method 3: one queue with two bindings... then use the exchange to differentiate input from output.

only posts from the output exchange are used to populate duplicate suppression cache. hmm...
read from the queue with inputs and outputs all the time. (Exchange field tells them apart.)
have a matching queue (within a plugin), where stuff read from the input queue is held until you see a corresponding output exchange result (if it makes it past the duplicate suppression cache.)

This would address the weakness of both other methods where if a node is slowly dying, the failover node will start queueing up stuff that it hasn't seen in the output queue, and when it gets the vip, it will catch up.

This works if the input and output exchanges are on the same broker, so that a single queue can have bindings to both.

petersilva commented 8 months ago

method 4: nodupe.sync class... gather() implements a second subscriber. with settings:

nsc_broker (defaults to post_broker)
nsc_exchange (defaults to post_exchange.)
nsc_topic (defaults to #733

is is installed with callback_prepend. has two entry points: gather, and after_accept.

Gather is a normal gather (like gather/message) but for every message gathered, you add a field "m["from_nodupe_sync_cache"] = True.

Then have an after_accept entry point that drops all messages with that field in it, so the cache is primed.

This seems really easy to do... and kind of a general way to explore shared state caches.

MetPX / sarracenia