ferdinandhubbard981 / GameOfLife-Distributed

Other
0 stars 0 forks source link

HALO effect (to implement after we fixed important bugs) #6

Open ferdinandhubbard981 opened 2 years ago

ferdinandhubbard981 commented 2 years ago

https://github.com/MathsPsychopath/GameOfLife/issues/13 continuation of this discussion

achan-css commented 2 years ago

Currently, we have the broker with an internal state of the whole 2d world. On priming/reinitialisation of workers, it will slice the world according to workSize and worker count, and distribute it to workers. The workers evolve 1 iteration, then send back the flipped cells to the broker.

When on a turn that hasn't reinitialised the workers, workers will build on their internal state and use halos given by the broker

ferdinandhubbard981 commented 2 years ago

You're suggesting that the halo is sent from worker -> broker -> newWorker

In the readme they suggest sending it directly from worker -> worker (this would obviously be better) The broker still needs to receive the world every turn. So the worker can send its slice of the world to the broker, without any response from the broker. Then the broker assembles the slices into a world and updates its internal state.

I will update the flowchart to try and explain what I mean, and we'll talk about it when we're ready to start implementing it (tomorrow).

achan-css commented 2 years ago

What I don't like about halo exchange is that it requires the IP address of the next worker. If the next worker disconnects then it's going to error, causing the one sending to that to error - a chain reaction. The work would also need to be redistributed evenly between workers

We can probably have a branch with halo exchange and another for fault tolerance. Alternatively, we might have the workers poll for the IP address of the ones to listen to and get them from there. But this is communication overhead.

ferdinandhubbard981 commented 2 years ago

What I don't like about halo exchange is that it requires the IP address of the next worker. If the next worker disconnects then it's going to error, causing the one sending to that to error - a chain reaction.

If a worker disconnects: . broker assigns new worker to that slice . broker updates IP address of the workers neighbouring the replacement-worker . new rpc connection is made between these workers

The work would also need to be redistributed evenly between workers

I don't think work distribution would be affected by this. Each worker is assigned a slice of x rows, nothing has changed.

We can probably have a branch with halo exchange and another for fault tolerance.

If we start work on HALO exchange before we finish fault tolerance and step 2&4, the merge conflicts would be terrible/ impossible, and would require a lot of code rewriting. I think it best to do the tasks in sequential order.

Alternatively, we might have the workers poll for the IP address of the ones to listen to and get them from there. But this is communication overhead.

Are you talking about when a worker disconnects? If so I agree, but I don't think the communication overhead would trump the increased execution speed of the HALO exchange.

ferdinandhubbard981 commented 2 years ago

Here is how I will implement it: broker will have a goroutine constantly receiving flippedCells paired with at turn. if the pairedTurn = b.turn+1 and len(flippedCellBatches) = numOfWorkers: apply flipped cells to world and turn++

meanwhile: worker carries on processing next turn as soon as it receives halo from adjacent workers

if worker breaks: reprime all workers, and start from b.currentWorld and b.turn.