Closed pplimport closed 11 years ago
Original author: Abhishek Gupta Original date: 2013-09-12 20:17:35
Does not hang on net or mpi layer Does not hang with Randomized Queue
Hangs on Ibverbs even with RDMA disabled. Hangs on Ibverbs even with charm from around an year ago.
Original date: 2013-09-16 19:54:43
Does this hang occur with ibverbs on the last stable release, v6.5.1?
Original author: Abhishek Gupta Original date: 2013-09-17 18:35:38
Yes, it does. Xiang reported that earlier.
Actually today Nikhil and I tested it again, and it did not hang with 6.5.1. It hangs with the latest charm though.
Original date: 2013-09-20 16:11:43
Tracked the issue back to a Makefile change I made. Compilation of sockRoutines.c needed to be passed a compile time macro, that I lost in my changes to Makefile. However, why that causes hangs is still unclear. Have sent a mail to Orion/Gengbin to get their opinion on why a CmiTmp* buffer scheme was implemented in sockRoutines.
Original date: 2013-09-24 17:02:04
Which commit to Makefile are you referring to? I'd like to understand what happened a bit better, and possibly see if other bugs are related to this.
Original date: 2013-09-24 17:10:03
commit id: 3795ee9b4d279ddd1d9c95ea3e270fd341df646c
I had modified the Makefile to generate the compilation command for converse related files using Make.depends. What I missed was that for compilation of sockRoutines, an environment variable was being defined at compile time, which enable use of different routines for memory allocation. I am still not sure of why this affects correction, but per Orion this has to do with handling of stacks. I am looking into it.
Original date: 2013-10-04 03:22:11
No further issue reported. Closing this issue.
Original author: Abhishek Gupta Original issue: https://charm.cs.illinois.edu/redmine/issues/290
Ibverbs hangs with leanmd:
Leanmd with ibverbs build of charm hangs while leanmd with net build charm works fine.
build of charm is ./build charm++ net-linux-x86_64 ibverbs --with-production -g The leanmd I use is from the latest one from git, the configure is 1) in def.h, change the line 21&22 to be
define PARTICLES_PER_CELL_START 990
define PARTICLES_PER_CELL_END 990
2) The command to run leanmd is ./charmrun ++nodelist nodelist +p16 ./leanmd 8 8 8 200 2000 2000 3) I run leanmd on 4 nodes of stampede using 4 cores/node, the submission script looks like
SBATCH -t 00:30:00
#
SBATCH -p development
#
SBATCH -N 4
#
SBATCH -n 16
#
SBATCH -J leanmd