codes-org / codes

The Co-Design of Exascale Storage Architectures (CODES) simulation framework builds upon the ROSS parallel discrete event simulation engine to provide high-performance simulation utilities and models for building scalable distributed systems simulations
Other
40 stars 16 forks source link

Adaptive routing bug in fat tree model (Imported #159) #159

Closed nmcglo closed 6 years ago

nmcglo commented 6 years ago

Original Issue Author: Philip Taffet Original Issue ID: 159 Original Issue URL: https://xgitlab.cels.anl.gov/codes/codes/issues/159


Hi, I'm not sure if this bug ever occurs in practice, but it seems like there's a small issue with the adaptive routing code for the fat tree model. The relevant code is https://xgitlab.cels.anl.gov/codes/codes/blob/master/src/networks/model-net/fattree.c#L2675

 // when occupancy is same, just choose random port
  outport = tw_rand_integer(lp->rng, start_port, end_port-1);  
  int load = s->vc_occupancy[outport] + s->queued_length[outport];
  if(load != 0) {
    //for(int port = start_port + 1; port < end_port; port++) {
    for(int port = start_port; port < end_port; port++) {
      if(s->vc_occupancy[port] +  s->queued_length[port] < load) {
        load = s->vc_occupancy[port] +  s->queued_length[port];
        outport = port;
        if(load <= 0) break;
      }
    }
  }

The issue is that if the randomly chosen port is not one of the least congested ports, the lowest number least congested port will always be chosen.

For example, if s->vc_occupancy[port] + s->queued_length[port] is 1, 3, 1, 5 for ports 0, 1, 2, 3 respectively, and these four ports all are potential next hops (i.e. start_port = 0, end_port = 4), then the packet will be routed through port 0 with probability .75 and through port 2 with probability .25 instead of a 50/50 split.

Unfortunately, I don't see an easy way to fix it right now other than making a pass to find all of the choices with the least congestion and then picking one randomly.

nmcglo commented 6 years ago

Philip Taffet:

Thanks Nikhil. I was not sure that this bug could be hit very often, but I didn't think about the behavior over time. If the choice is biased towards the lower numbered ports, that should transiently increase their congestion, which means they will no longer be minimal choices.

A few months ago, I thought I saw some data where the lower-port bias was evident, and while investigating it, I discovered this issue. I think the data turned out to be wrong, but I figured I should report this at some point.

nmcglo commented 6 years ago

Philip Taffet:

Hi Noah, I think something like this uses the right probabilities, but is probably slower:

  int* minset = malloc(sizeof(int) * (end_port - start_port));
  int minset_count = 0;
  int load = INT_MAX;
  for(int port = start_port; port < end_port; port++) {
      if(s->vc_occupancy[port] + s->queued_length[port] < load) {
          load = s->vc_occupancy[port] +  s->queued_length[port];
          minset[0] = port;
          minset_count = 1;
      }
      else if (s->vc_occupancy[port] + s->queued_length[port] == load) {
          minset[minset_count++] = port;
      }
  }
  outport = minset[tw_rand_integer(lp->rng, 0, minset_count - 1)];
  free(minset);

It also generates exactly one random integer, so I don't think the reverse event needs to change.

The original code seems to optimize for the load==0 case, but this one doesn't. I could add a loop before this to check for that case. Do you have any idea if this is a bottleneck for the model? I have not profiled CODES yet.

nmcglo commented 6 years ago

Noah Wolfe:

Nice catch Philip. Have you implemented a fix you can submit?

nmcglo commented 6 years ago

Nikhil:

While imperfect, I don't think this is an issue because of the following reason. (note: a typical network doesn't guarantee uniform usage, but adaptively if congestion is present)

Our goal: if congestion arise, pick "a" list loaded port. If the load is same for 2 ports, this chooses the one with lower port count for most cases. Now, if there was congestion, the load for the first port will increase and remain transiently higher, and for the next packet, the other port will be chosen.

Overtime, if the network is so empty that some port get repeatedly chosen, then we are looking at a non congested scenario with no impact on performance. Yes, the links usage is not uniform, but we haven't introduced congestion in the process, so it is not an issue.

nmcglo commented 6 years ago

Philip Taffet:

closed