atif4461 / PR_DNS_base

0 stars 0 forks source link

PETSc Create() freezes for 2048^3 #4

Open atif4461 opened 7 months ago

atif4461 commented 7 months ago

With 64 CPU nodes of Perlmutter

atif4461 commented 7 months ago

Try out equivalent memory usage for a single node run

atif4461 commented 7 months ago

Equivalent problem for a single node (128 MPI) is 512x512x512 partitioned by 4x4x8 which works.

The largest problem that I have been able to run so far on a single Perlmutter CPU node is 512x512x512 with 10^8 particles partitioned by 4x4x8.

atif4461 commented 7 months ago

Upgraded to Petsc 3.20.4

[0]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [0]PETSC ERROR: Overflow in integer operation: https://petsc.org/release/faq/#64-bit-indices [0]PETSC ERROR: Global size overflow 8589934592. You may consider ./configure PETSc with --with-64-bit-indices for the case you are running

atif4461 commented 7 months ago

Still freezes inside KSPCreate()

atif4461 commented 6 months ago

Tried with BCGSL, same problem.

atif4461 commented 6 months ago

Petsc with 64 bit indices is VERY slow.

512x512x512 grid, partitioned into 1024 MPI tasks by 16x8x8

==================== 32 bit indices

atif13 setDomain : 0.00 atif14 setComponent : 0.01 atif15 computeSource : 0.64 atif16 computeAdvection : 1.65 atif17 computeSupersat : 0.02 atif18 setAdvection : 0.00

atif1 NavierStokes solver : 9.67 atif2 Particle Propagate + Vapor temperature : 2.32 atif3 Particle Propagate : 0.00 atif4 FT Add Set TimeStep : 0.00 runtime = 11.99, total runtime = 11.99, time = 0.001709001 step = 1 dt = 0.001968933

==================== 64 bit indices

atif13 setDomain : 0.00 atif14 setComponent : 0.01 atif15 computeSource : 0.60 atif16 computeAdvection : 153.15 atif17 computeSupersat : 0.02 atif18 setAdvection : 0.00

atif1 NavierStokes solver : 336.79 atif2 Particle Propagate + Vapor temperature : 153.78 atif3 Particle Propagate : 0.00 atif4 FT Add Set TimeStep : 0.00 runtime = 490.57, total runtime = 490.57, time = 0.001709001 step = 1 dt = 0.001968933

Reverting to 32 bit PETSc as 64 bit is slow and does not solve the problem.

atif4461 commented 6 months ago

Integer overflow somewhere, changed ilower iupper to long ints, now freezing at VecZeroEntries

iFluid/iFcartsn3d.cpp solver/solver.cpp

atif4461 commented 6 months ago

Know for certain that

iFluid/iFcartsn3d.cpp 576 //solver.Reset_x(); 577 //solver.Reset_b(); 677 //solver.Set_A(I,I,aII); 678 //solver.Set_b(I, rhs);

solver/solver.cpp 397 ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); 398 ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);

have overflows. Commenting the above lines runs into overflows at GMRES1().

Trying again with 64 bit PETSc.

atif4461 commented 6 months ago

Commenting the problematic lines with 64 bit PETSc throws a different error at a later stage after Petsc::Solve()

[3729]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [3729]PETSC ERROR: Object is in wrong state [3729]PETSC ERROR: Matrix is missing diagonal entry 0

Uncommenting the above lines throws overflow error at ierr = MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); ierr = MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);

atif4461 commented 5 months ago

Made a bunch of changes from int to prdns_int, integer overflows change to a different error

[4753]PETSC ERROR: --------------------- Error Message -------------------------------------------------------------- [4753]PETSC ERROR: Argument out of range ​ [4753]PETSC ERROR: Column too large: col 2954370414328940604 max 8589934591 ​ [4753]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc! [4753]PETSC ERROR: Option left: name:-d value: 3 source: command line ​ [4753]PETSC ERROR: Option left: name:-i value: ./climate/input-pr-dns/in-entrainment3dd_case1_vlm_test3 source: command line ​ [4753]PETSC ERROR: Option left: name:-o value: /pscratch/sd/a/atif/out-gmres3-2048x2048x2048-32x16x16-64bit-prdns-gpunode source: command line ​ [4753]PETSC ERROR: Option left: name:-p value: 32 source: command line [4753]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting. ​ [4753]PETSC ERROR: Petsc Development GIT revision: v3.20.4-620-gb3616e8287d GIT Date: 2024-02-14 22:22:42 +0000 ​ [4753]PETSC ERROR: /global/u1/a/atif/PR_DNS_base/DNS/./climate/climate on a named nid002889 by atif Sun Apr 21 19:03:55 2024 ​ [4753]PETSC ERROR: Configure options --CC=cc --CXX=CC --FC=ftn --prefix=/global/homes/a/atif/packages/petsc-3.20.4-cudaaware-64bit --with-debugging=no COPTFLAGS="-O3 -march=native -mtune=native" CXXOPTFLAGS="-O3 -march=native -mtune=native" FOPTFLAGS="-O3 -march=native -mtune=native" --with-64-bit-indices --download-make=1 --download-hdf5=1 --download-hypre=1 --with-shared-libraries --with-static=1 --with-cuda -CUDAC=nvcc ​ [4753]PETSC ERROR: #1 MatSetValues_MPIAIJ() at /global/u1/a/atif/packages/petsc-3.20.4-gitlab/src/mat/impls/aij/mpi/mpiaij.c:564 ​ [4753]PETSC ERROR: #2 MatSetValues() at /global/u1/a/atif/packages/petsc-3.20.4-gitlab/src/mat/interface/matrix.c:1509 ​ [4762]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------​