Closed nmcglo closed 6 years ago
Philip Taffet:
Thanks Nikhil. I was not sure that this bug could be hit very often, but I didn't think about the behavior over time. If the choice is biased towards the lower numbered ports, that should transiently increase their congestion, which means they will no longer be minimal choices.
A few months ago, I thought I saw some data where the lower-port bias was evident, and while investigating it, I discovered this issue. I think the data turned out to be wrong, but I figured I should report this at some point.
Philip Taffet:
Hi Noah, I think something like this uses the right probabilities, but is probably slower:
int* minset = malloc(sizeof(int) * (end_port - start_port));
int minset_count = 0;
int load = INT_MAX;
for(int port = start_port; port < end_port; port++) {
if(s->vc_occupancy[port] + s->queued_length[port] < load) {
load = s->vc_occupancy[port] + s->queued_length[port];
minset[0] = port;
minset_count = 1;
}
else if (s->vc_occupancy[port] + s->queued_length[port] == load) {
minset[minset_count++] = port;
}
}
outport = minset[tw_rand_integer(lp->rng, 0, minset_count - 1)];
free(minset);
It also generates exactly one random integer, so I don't think the reverse event needs to change.
The original code seems to optimize for the load==0
case, but this one doesn't. I could add a loop before this to check for that case.
Do you have any idea if this is a bottleneck for the model? I have not profiled CODES yet.
Noah Wolfe:
Nice catch Philip. Have you implemented a fix you can submit?
Nikhil:
While imperfect, I don't think this is an issue because of the following reason. (note: a typical network doesn't guarantee uniform usage, but adaptively if congestion is present)
Our goal: if congestion arise, pick "a" list loaded port. If the load is same for 2 ports, this chooses the one with lower port count for most cases. Now, if there was congestion, the load for the first port will increase and remain transiently higher, and for the next packet, the other port will be chosen.
Overtime, if the network is so empty that some port get repeatedly chosen, then we are looking at a non congested scenario with no impact on performance. Yes, the links usage is not uniform, but we haven't introduced congestion in the process, so it is not an issue.
Philip Taffet:
closed
Original Issue Author: Philip Taffet Original Issue ID: 159 Original Issue URL: https://xgitlab.cels.anl.gov/codes/codes/issues/159
Hi, I'm not sure if this bug ever occurs in practice, but it seems like there's a small issue with the adaptive routing code for the fat tree model. The relevant code is https://xgitlab.cels.anl.gov/codes/codes/blob/master/src/networks/model-net/fattree.c#L2675
The issue is that if the randomly chosen port is not one of the least congested ports, the lowest number least congested port will always be chosen.
For example, if
s->vc_occupancy[port] + s->queued_length[port]
is1, 3, 1, 5
for ports0, 1, 2, 3
respectively, and these four ports all are potential next hops (i.e.start_port = 0, end_port = 4
), then the packet will be routed through port 0 with probability .75 and through port 2 with probability .25 instead of a 50/50 split.Unfortunately, I don't see an easy way to fix it right now other than making a pass to find all of the choices with the least congestion and then picking one randomly.