Open eisungy opened 3 months ago
Hi @eisungy,
We typically don't run PUMI with so few elements per part (MPI rank).
I'm worried that PUMI didn't work because all meshes are duplicated and they don't have partition map or any partition.
Your concern is correct; without a partition of the mesh none of the PUMI distributed functions will work as expected.
IIRC, there is no guarantee that Zoltan/Parmetis (used by zsplit) won't create empty parts. We have not tested split
down to the levels described here. I'd have to see the error logs to say more. For one of the failed cases, would you please provide the input mesh, build info, execution command (split/zsplit and the arguments), and error logs. I can't give an estimate of how soon someone will be able to do a deep dive on the bug, but maybe we'll see something in the error log.
I can't think of something offhand.
split_err_test.tar.gz
Hi @cwsmith ,
Thank you for your answer. I upload mesh files with error messages returned by split
for 48/96/144 parts. (split.err.XX files)
All results printed out one below message only.
(1 << depth) == multiple failed at /home/esyoon/src/core/core-master-20240315/parma/rib/parma_mesh_rib.cc + 69
I couldn't include an error message from zsplit
for the 336 parts case in the attached file, but its error message is as below.
APF warning: 9 empty parts
numDc+numIso >= 1 failed at /home/esyoon/src/core/core-master-20240315/parma/diffMC/parma_dcpart.cc + 124
numDc+numIso >= 1 failed at /home/esyoon/src/core/core-master-20240315/parma/diffMC/parma_dcpart.cc + 124
numDc+numIso >= 1 failed at /home/esyoon/src/core/core-master-20240315/parma/diffMC/parma_dcpart.cc + 124
numDc+numIso >= 1 failed at /home/esyoon/src/core/core-master-20240315/parma/diffMC/parma_dcpart.cc + 124
numDc+numIso >= 1 failed at /home/esyoon/src/core/core-master-20240315/parma/diffMC/parma_dcpart.cc + 124
numDc+numIso >= 1 failed at /home/esyoon/src/core/core-master-20240315/parma/diffMC/parma_dcpart.cc + 124
numDc+numIso >= 1 failed at /home/esyoon/src/core/core-master-20240315/parma/diffMC/parma_dcpart.cc + 124
numDc+numIso >= 1 failed at /home/esyoon/src/core/core-master-20240315/parma/diffMC/parma_dcpart.cc + 124
numDc+numIso >= 1 failed at /home/esyoon/src/core/core-master-20240315/parma/diffMC/parma_dcpart.cc + 124
It is a warning, but couldn't get any resultant partitioned files.
Thank you for your investigation.
Hi. One of application codes I'm involved with is using
split
andzsplit
to partition 982 faces of a serial mesh. The application code is based on discontinuous galerkin method, and users want to distribute the mesh to get about one or two elements per MPI rank.However, for some cases,
split
andzsplit
didn't work. Below is from the parameter scan.split
zsplit
split
zsplit
Since the users' computer cluster equips 48 cores per one computing node, they want to partition the mesh to get multiples of 48. Since the number of elements for the mesh is small, one idea I'm thinking of is for each rank to load the entire same mesh without partitioning. But for that case, I'm worried that PUMI didn't work because all meshes are duplicated and they don't have partition map or any partition.
In sum, I have two questions.
split
andzsplit
as the ratio of number of elmenets to total MPI ranks is close to 1?Thanks.