Open anilpacaci opened 4 years ago
This is indeed an easy way to parallelize the execution of multiple edges. Spanning trees do not need to communicate and as long as the order preserved within a spanning tree, we can provide correct answers. The challenge is load balancing as some spanning trees will get extremely large and do most of the work
A real interesting/challenging parallel execution is the scenario where multiple threads work on the same tree. Here we need to talk about serializability, whenever a thread X
inserts an edge E
to the tree, it needs to make sure that other threads that started before X for a transition before this edge E has already completed. This ensures that edge E
has all the source nodes placed in the tree when X
starts processing
This issue is related to #15 . We can maintain a buffer for incoming edges in each partition, then we can potentially parallelize execution of multiple stream tuples. Here each buffer needs to maintain FIFO order to ensure correctness within a spanning tree.