Open petersilva opened 5 hours ago
so... maybe we need to define a topology putting hostnames in a directive?
cluster_nodes transfer04 transfer05 transfer06
so with such a layout, transfer04 is "NODE" 0, transfer05 is NODE 1, 6 is 2. the count of the hostnames gives the node count, and can distribute things across it using NODE numbers and the count as a divisor.
the other wrinkle is... if you add/remove a node, you may need to redo all the bindings... so then #35 becomes a dependency.
fwiw... it seems like auditing the flow of file operations is much simpler and more straight-forward than implementing all this. We could figure it out today and make had-crafted configurations for each subscriber today... but It would be quite painful to modify... having sr3 make the necessary calculations & linkages would be an easier approach for the analysts, but it's still a long haul.
All the implementation will do is serialize the transfers to preserve ordering, which, in the vast majority of cases, is not needed. It will reduce performance, but how much is unclear.
An audit that identifies files with potential access race conditions will preserve parallelism, and maximize performance... but it requires more analysis for deployment/developers.
See Corresponding C implementation issue for background: https://github.com/MetPX/sarrac/issues/174
Things that need to be done to robustly support that on the python side:
If we want to implement exchangeSplit properly, then as pointed out here: https://github.com/MetPX/sarrac/issues/174#issuecomment-2476906203
We need to establish instances over an entire cluster, not just a single node, in order to get singleton processing working properly.
e.g... a winnow publishes to 20 exchanges... we have 4 consuming nodes, each with 5 instances... we would want the bindings to look something like:
x1 -> n1i1, x2->n1i2, ... x5-> n1i5, x6->n2i1 ... x10 -> n2i5, x11 -> n3i1, ... x15 -> n3i5 ....
or another mapping would be to have subsets of instances on each node...
x1 -> n1i1 .... x6 -> n2i6, x11->n3i11 ...
Would have to do the math one way or another. and create the exchanges, queues and bindings.