Open atif4461 opened 7 months ago
Try out equivalent memory usage for a single node run
Equivalent problem for a single node (128 MPI) is 512x512x512 partitioned by 4x4x8 which works.
The largest problem that I have been able to run so far on a single Perlmutter CPU node is 512x512x512 with 10^8 particles partitioned by 4x4x8.
Upgraded to Petsc 3.20.4
[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Overflow in integer operation: https://petsc.org/release/faq/#64-bit-indices [0]PETSC ERROR: Global size overflow 8589934592. You may consider ./configure PETSc with --with-64-bit-indices for the case you are running
Still freezes inside KSPCreate()
Tried with BCGSL, same problem.
Petsc with 64 bit indices is VERY slow.
512x512x512 grid, partitioned into 1024 MPI tasks by 16x8x8
atif13 setDomain : 0.00 atif14 setComponent : 0.01 atif15 computeSource : 0.64 atif16 computeAdvection : 1.65 atif17 computeSupersat : 0.02 atif18 setAdvection : 0.00
atif1 NavierStokes solver : 9.67 atif2 Particle Propagate + Vapor temperature : 2.32 atif3 Particle Propagate : 0.00 atif4 FT Add Set TimeStep : 0.00 runtime = 11.99, total runtime = 11.99, time = 0.001709001 step = 1 dt = 0.001968933
atif13 setDomain : 0.00 atif14 setComponent : 0.01 atif15 computeSource : 0.60 atif16 computeAdvection : 153.15 atif17 computeSupersat : 0.02 atif18 setAdvection : 0.00
atif1 NavierStokes solver : 336.79 atif2 Particle Propagate + Vapor temperature : 153.78 atif3 Particle Propagate : 0.00 atif4 FT Add Set TimeStep : 0.00 runtime = 490.57, total runtime = 490.57, time = 0.001709001 step = 1 dt = 0.001968933
Reverting to 32 bit PETSc as 64 bit is slow and does not solve the problem.
Integer overflow somewhere, changed ilower iupper to long ints, now freezing at VecZeroEntries
iFluid/iFcartsn3d.cpp solver/solver.cpp
Know for certain that
iFluid/iFcartsn3d.cpp 576 //solver.Reset_x(); 577 //solver.Reset_b(); 677 //solver.Set_A(I,I,aII); 678 //solver.Set_b(I, rhs);
solver/solver.cpp 397 ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); 398 ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);
have overflows. Commenting the above lines runs into overflows at GMRES1().
Trying again with 64 bit PETSc.
Commenting the problematic lines with 64 bit PETSc throws a different error at a later stage after Petsc::Solve()
[3729]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [3729]PETSC ERROR: Object is in wrong state [3729]PETSC ERROR: Matrix is missing diagonal entry 0
Uncommenting the above lines throws overflow error at ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);
Made a bunch of changes from int to prdns_int, integer overflows change to a different error
[4753]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [4753]PETSC ERROR: Argument out of range [4753]PETSC ERROR: Column too large: col 2954370414328940604 max 8589934591 [4753]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! [4753]PETSC ERROR: Option left: name:-d value: 3 source: command line [4753]PETSC ERROR: Option left: name:-i value: ./climate/input-pr-dns/in-entrainment3dd_case1_vlm_test3 source: command line [4753]PETSC ERROR: Option left: name:-o value: /pscratch/sd/a/atif/out-gmres3-2048x2048x2048-32x16x16-64bit-prdns-gpunode source: command line [4753]PETSC ERROR: Option left: name:-p value: 32 source: command line [4753]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. [4753]PETSC ERROR: Petsc Development GIT revision: v3.20.4-620-gb3616e8287d GIT Date: 2024-02-14 22:22:42 +0000 [4753]PETSC ERROR: /global/u1/a/atif/PR_DNS_base/DNS/./climate/climate on a named nid002889 by atif Sun Apr 21 19:03:55 2024 [4753]PETSC ERROR: Configure options --CC=cc --CXX=CC --FC=ftn --prefix=/global/homes/a/atif/packages/petsc-3.20.4-cudaaware-64bit --with-debugging=no COPTFLAGS="-O3 -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native" --with-64-bit-indices --download-make=1 --download-hdf5=1 --download-hypre=1 --with-shared-libraries --with-static=1 --with-cuda -CUDAC=nvcc [4753]PETSC ERROR: #1 MatSetValues_MPIAIJ() at /global/u1/a/atif/packages/petsc-3.20.4-gitlab/src/mat/impls/aij/mpi/mpiaij.c:564 [4753]PETSC ERROR: #2 MatSetValues() at /global/u1/a/atif/packages/petsc-3.20.4-gitlab/src/mat/interface/matrix.c:1509 [4762]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
With 64 CPU nodes of Perlmutter