Open hapfang opened 6 years ago
the phasta and phastaChef driver being used:
$ cat run_phastaChef.sh # on the phasta viz node
mpirun -np 4 ~/phastaChef-develop/phastaChef-ThetaDamBreak/build/chefPhastaICLoop_sam_adapt_sz 4 samAdaptLoop_t7_p1.inp 2>&1 | tee phastaChef.log
$ cat run_phasta.sh
mpirun -np 4 /users/fang/git_phasta-dev-CMake/build-next-viz-phastaChefSupport/bin/phastaIC.exe
The phasta next-LSorig+pinning
branch has a few local commits in your build. I assume these are required?
+cwsmith@jumpgate: /users/fang/git_phasta-dev-CMake/phasta-next (next-LSorig+pinning)$ git log --decorate --oneline -10
590f8b0 (HEAD, next-LSorig+pinning) Put back the subroutine get_phasic_vol
162aa99 Saving phasic volumes into vhist.dat
d626845 (origin/next-LSorig+pinning) removing the write of restart at the start of stream based adaptivity. Also return flowDiag computation to standard form which does not reproduce serial exactly as can be useful for debugging
When I build and run the stack with GCC 4.4.5 on a Debian Squeeze SCOREC system (avatar) execution fails with an memory error in the call to AsiGMR
. Note, all processes hit the same error.
Would you please run under valgrind on the viz nodes?
Below are my environment and build script for phastaChef:
$ cat envDebianSqueeze.sh
module load cmake/latest pumi git
$ cat configPhastaChefSqueeze.sh
#!/bin/bash
flags='-O2 -g -Wall -Wextra'
cmake /lore/cwsmith/killme/phastachef-jun/phastaChef \
-DCMAKE_C_COMPILER=mpicc \
-DCMAKE_CXX_COMPILER=mpicxx \
-DCMAKE_Fortran_COMPILER=gfortran \
-DCMAKE_C_FLAGS="${flags}" \
-DCMAKE_CXX_FLAGS="${flags}" \
-DCMAKE_Fortran_FLAGS="${flags}" \
-DSCOREC_PREFIX=$PUMI_INSTALL_DIR \
-DPHASTA_SRC_DIR=/lore/cwsmith/killme/phastachef-jun/phasta-next \
-DPHASTA_INCOMPRESSIBLE=ON \
-DPHASTA_COMPRESSIBLE=ON \
-DPHASTA_USE_PETSC=OFF \
-DPHASTA_USE_SVLS=ON \
-DPHASTA_USE_LESLIB=ON \
-DPHASTA_TESTING=ON \
-DLESLIB=/users/cwsmith/develop/libLes/libles_gcc.a \
-DCASES=/lore/cwsmith/cdash/phastaChefTests
The relevant parts of the valgrind output from one process is below.
==28815== Invalid read of size 8
==28815== at 0x437600: input_fform(phSolver::Input&) (input_fform.cc:558)
==28815== by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)
==28815== by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)
==28815== Address 0x16bf36e8 is 0 bytes after a block of size 24 alloc'd
==28815== at 0x4C2695A: operator new(unsigned long) (vg_replace_malloc.c:298)
==28815== by 0x4463DF: std::vector<double, std::allocator<double> >::operator=(std::vector<double, std::allocator<double> > const&) (new_allocator.h:89)
==28815== by 0x43509B: input_fform(phSolver::Input&) (input_fform.cc:275)
==28815== by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)
==28815== by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)
==28815==
==28815== Invalid read of size 8
==28815== at 0x43760A: input_fform(phSolver::Input&) (input_fform.cc:559)
==28815== by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)
==28815== by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)
==28815== Address 0x16bf36f0 is 8 bytes after a block of size 24 alloc'd
==28815== at 0x4C2695A: operator new(unsigned long) (vg_replace_malloc.c:298)
==28815== by 0x4463DF: std::vector<double, std::allocator<double> >::operator=(std::vector<double, std::allocator<double> > const&) (new_allocator.h:89)
==28815== by 0x43509B: input_fform(phSolver::Input&) (input_fform.cc:275)
==28815== by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)
==28815== by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)
==28815==
==28815== Invalid read of size 8
==28815== at 0x437613: input_fform(phSolver::Input&) (input_fform.cc:560)
==28815== by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)
==28815== by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)
==28815== Address 0x16bf36f8 is 16 bytes after a block of size 24 alloc'd
==28815== at 0x4C2695A: operator new(unsigned long) (vg_replace_malloc.c:298)
==28815== by 0x4463DF: std::vector<double, std::allocator<double> >::operator=(std::vector<double, std::allocator<double> > const&) (new_allocator.h:89)
==28815== by 0x43509B: input_fform(phSolver::Input&) (input_fform.cc:275)
==28815== by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)
==28815== by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)
==28815==
==28815== Invalid read of size 8
==28815== at 0x4376E8: input_fform(phSolver::Input&) (input_fform.cc:566)
==28815== by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)
==28815== by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)
==28815== Address 0x16bf36e8 is 0 bytes after a block of size 24 alloc'd
==28815== at 0x4C2695A: operator new(unsigned long) (vg_replace_malloc.c:298)
==28815== by 0x4463DF: std::vector<double, std::allocator<double> >::operator=(std::vector<double, std::allocator<double> > const&) (new_allocator.h:89)
==28815== by 0x43509B: input_fform(phSolver::Input&) (input_fform.cc:275)
==28815== by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)
==28815== by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)
==28815==
==28815== Invalid read of size 8
==28815== at 0x4376F2: input_fform(phSolver::Input&) (input_fform.cc:567)
==28815== by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)
==28815== by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)
==28815== Address 0x16bf36f0 is 8 bytes after a block of size 24 alloc'd
==28815== at 0x4C2695A: operator new(unsigned long) (vg_replace_malloc.c:298)
==28815== by 0x4463DF: std::vector<double, std::allocator<double> >::operator=(std::vector<double, std::allocator<double> > const&) (new_allocator.h:89)
==28815== by 0x43509B: input_fform(phSolver::Input&) (input_fform.cc:275)
==28815== by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)
==28815== by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)
==28815==
==28815== Invalid read of size 8
==28815== at 0x4376FB: input_fform(phSolver::Input&) (input_fform.cc:568)
==28815== by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)
==28815== by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)
==28815== Address 0x16bf36f8 is 16 bytes after a block of size 24 alloc'd
==28815== at 0x4C2695A: operator new(unsigned long) (vg_replace_malloc.c:298)
==28815== by 0x4463DF: std::vector<double, std::allocator<double> >::operator=(std::vector<double, std::allocator<double> > const&) (new_allocator.h:89)
==28815== by 0x43509B: input_fform(phSolver::Input&) (input_fform.cc:275)
==28815== by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)
==28815== by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)
==28815== Warning: silly arg (-6832023353530777600) to malloc()
==28815== Invalid free() / delete / delete[] / realloc()
==28815== at 0x4C2578E: free (vg_replace_malloc.c:446)
==28815== by 0x10E5571A: free_mem (in /lib/libc-2.11.3.so)
==28815== by 0x10E552B1: __libc_freeres (in /lib/libc-2.11.3.so)
==28815== by 0x4A206BD: _vgnU_freeres (vg_preloaded.c:62)
==28815== by 0x10D7131D: __run_exit_handlers (exit.c:93)
==28815== by 0x10D713C4: exit (exit.c:100)
==28815== by 0x102BBACB: ??? (in /usr/lib/libgfortran.so.3.0.0)
==28815== by 0x102BBF7B: _gfortran_os_error (in /usr/lib/libgfortran.so.3.0.0)
==28815== by 0x102BC21E: ??? (in /usr/lib/libgfortran.so.3.0.0)
==28815== by 0x103427B2: ??? (in /usr/lib/libgfortran.so.3.0.0)
==28815== by 0x4EAEBE: elmgmr_ (elmgmr.f:261)
==28815== by 0x4AA0CA: solflow_ (solfar.f:144)
==28815== Address 0x404a7a8 is not stack'd, malloc'd or (recently) free'd
Hi Cameron, sorry for the lag of reply in this issue. This is not longer urgent as we are pushing the phastaChef runs with BL adapt freezing. However, it would be very nice to have phastaChef also work on two-phase flow cases with mixed mesh. I've just redone a quick test, and the issue is reproducible for the redistancing after phastaChef adapt. The local commit in my phasta-next repo contains bug fixes from the recent work of Ken's and mine, which are probably not directly related to the reported issue.
As for the valgrind results, could you please the command you used to generate the valgrind error report you showed above? I am not familiar with valgrind.
Hi Cameron, as Ken mentioned in a previous email. I have encountered the divergence issue when trying to run phastaChef with the mixed mesh. The residuals reported in phSolver output are all NaN's for the redistancing steps.
It is important to know that the same mixed mesh works fine when we directly ran the PHASTA with it.
You mentioned "I suspect we didn't add support for multiple topology blocks with streams. " Could you please help us double check that? To facilitate your debugging efforts, I have made a tarball with the test case in it (downloadable via the link below). https://drive.google.com/file/d/1AibsskiPTMbSINrzIDeDHTWbZYFTAhGw/view?usp=sharing If you prefer to test on viznode, the test case can be found at
/users/fang/annularFlow/mixedMeshTest/1-1-Chef/4-1-Chef/phastaChefTest
You can find the case details in README:
Thank you so much for the help.