Closed nmcglo closed 5 years ago
Misbah Mubarak:
Status changed to closed
Jonathan Jenkins:
mentioned in issue #143
Jonathan Jenkins:
The terminal LP's router id is calculated as so:
s->router_id=(int)s->terminal_id / (s->params->num_routers/2);
terminal_id is 1044 and the num_routers
parameter (in the configuration file with the same name) is 4, making router_id 522. Is there some underlying assumption with the terminal/router/group makeup that's being violated with this setup? I'm also getting the simulation output
Total nodes 72 routers 36 groups 9 radix 8
Which doesn't match with the router count in the config file. Does that have something to do with it?
Jonathan Jenkins:
Valgrind is clean on my end, so no memory corruption...
Jonathan Jenkins:
Ok, a couple things so far:
node: 0: error: /nfs2/jenkins/work/CODES/codes/src/util/codes_mapping.c:197: Unable to find LP id given group "DRAGONFLY_GRP", typename "modelnet_dragonfly_router", annotation "<NULL>", repetition 522, and offset 0
This comes from packet_send, line 1263. s->router_id, which is used as the repetition is 522. The configuration only has 264 routers. Is the initial calculation of s->router_id suspect?
Misbah Mubarak:
The num_routers entry in the config file was inconsistent with the repetitions thats why we were getting this issue. There are some safety checks in the dragonfly model now to tell if the num_routers is inconsistent with the repetition so we shouldn't be overlooking this in the future.
Original Issue Author: Misbah Mubarak Original Issue ID: 142 Original Issue URL: https://xgitlab.cels.anl.gov/codes/codes/issues/142
When running in serial mode, CODES checkpoint test is failing to find associated model-net LP and gives a fatal error of sending to invalid MPI rank. IIRC the test used to run fine until we made the modelnet_dragonfly_router changes to the mapping API. To reproduce the error:
./tests/test-checkpoint --sync=1 --codes-config=../tests/conf/test-checkpoint-dfly.conf
This issue is blocking me right now so I am labeling it as high priority.