Open nmcglo opened 3 years ago
I believe that I've come up with a solution. If possible, I will put it in it's own branch and pull request.
Ultimately what I've done is added a new protocol to efficiently notify the background ranks that the primary workload. When non synthetic workloads finish (this is determined by a slightly modified notify_neighbor()
method), they notify the last rank in each non-synthetic job with a new event type CLI_OTHER_FINISH
. Upon receipt of this event, each receiving rank (the last in each workload) will update and check a list of jobs that it knows have been completed. If the number of completed non synthetic jobs equals the number of non-synthetic jobs, then the last rank in the highest ordered non-synthetic job will then notify all ranks from all synthetic jobs that the primary jobs have completed - thus prompting them to stop generating data.
I've implemented this and done some small tests but it's still pretty messy so it's not committed yet. Will do so tomorrow, reference this issue and close it as it will be set to be addressed in a coming pull request.
Some preliminary testing of my unpushed branch has worked out well. I've written it so that if --max_gen_data is used, the simulation will continue until all workloads, including synthetic have stopped.
Addressed in the current develop branch, will close when released.
Current perceived intention of this workload file allows for multiple traces to be replayed (or online workloads) in addition to synthetic background traffic that is set to continue injecting until the trace based workloads finish. But the implementation does not allow this - it can have either 1 trace + multiple synthetic or multiple trace + no synthetic.
To allow for constant synthetic injection until the trace has completed, as ranks from trace based workloads finish, they notify their neighboring ranks in their workload that they've finished. Once all have finished, they notify the background traffic that they are all done and that signals the synthetic background traffic to stop generating traffic.
There are multiple issues with how this is implemented. Specifically of note is the
notify_neighbor()
function. What this appears to attempt to do (which works if there is only one trace) is check "Am I the last rank in my workload, have I finished, and has my preceding neighbor finished? If so, then we're done, notify background traffic of this fact.But that's not what is actually implemented. Instead what is implemented is "Is my local rank within my workload equal to the number of trace ranks-1? Have I finished, and has my preceding neighbor finished? If so, then we're done, notify every other workload besides mine of this fact and have them change their rank "is finished" state to "finished".
The local rank within a workload will only ever be equal to the number of trace ranks - 1 if there is only one dumpi trace being replayed. If you had two dumpi traces being replayed each with 1000 ranks, the maximum local rank of any one rank would be 999 but the number of trace ranks -1 would be 1999.
As long as problem 1 is not addressed, problem 2 will never occur but for sake of documentation, the intention is to "once all workloads are complete, notify the background traffic" but the notify_background_traffic() function that is called makes no distinction about what workload ranks it is sending to so long as they aren't from the workload that is sending the notification. So if there is another, uncompleted, trace running, they will also receive a notification that forces each rank to set their
is_finished
flag to true, even if they're not finished. This means that should they receive a notification from their neighbor that they've finished, then they'll forward on that message despite themselves being actually incomplete. This problem is important to note because should problem 1 be addressed improperly, this could become a major - silent - issue.Moving down in
notify_neighbor()
, we get to where the 'within a workload notification chain' is started/forwarded. It checks: "am I finished? Am I local rank 0 or has my preceding neighbor finished? If so, send to the next rank in my workload to start/forward this notification chain"Anyway, I believe that my work on congestion control will require this to be addressed. Assigning to myself. My goal is to make it so that multiple traces can be replayed with synthetic traffic as well.