SCOREC / core

parallel finite element unstructured meshes
Other
182 stars 62 forks source link

phastaChef issue with mixed mesh #179

Open hapfang opened 6 years ago

hapfang commented 6 years ago

Hi Cameron, as Ken mentioned in a previous email. I have encountered the divergence issue when trying to run phastaChef with the mixed mesh. The residuals reported in phSolver output are all NaN's for the redistancing steps.

 stopjob,lstep,istep          -1           0           0
     1 7.422E-01 5.009E-05  (   0)   5.200E-02   1.317E+25  <   617-    3|  12> [  20 -  69]
     1 8.475E-01 3.209E-06           3.626E-02                               [    29]
     1 1.276E+00 1.756E-05  (  -4)   2.293E-02   8.268E-01  <   220-    4|  12> [  17 -  59]
     1 1.397E+00 9.681E-08           1.955E-03                               [    31]
     1 1.834E+00 1.397E-05  (  -5)   1.672E-02   6.644E-01  <   617-    3|  12> [  17 -  62]
     1 1.953E+00 6.644E-08           1.211E-03                               [    30]
     1 2.374E+00 7.209E-06  (  -8)   1.107E-02   4.648E-01  <  3445-    2|  11> [  14 -  55]
     1 2.492E+00 5.264E-08           1.131E-03                               [    29]
     1 2.557E+00 8.051E-12                 NaN                               [     0]
     1 2.639E+00 7.364E-12                 NaN                               [     0]
     1 2.720E+00 7.192E-12                 NaN                               [     0]
     1 2.802E+00 6.926E-12                 NaN                               [     0]

It is important to know that the same mixed mesh works fine when we directly ran the PHASTA with it.

 stopjob,lstep,istep          -1           0           0
     1 3.884E-01 5.009E-05  (   0)   5.850E-02   1.643E+25  <   711-    3|  12> [  26 -  76]
     1 4.458E-01 3.335E-06           4.456E-02                               [    33]
     1 7.153E-01 1.715E-05  (  -4)   2.422E-02   6.616E-01  <   217-    4|  12> [  23 -  65]
     1 7.777E-01 1.035E-07           2.347E-03                               [    36]
     1 1.021E+00 1.329E-05  (  -5)   1.556E-02   3.994E-01  <   711-    3|  13> [  25 -  69]
     1 1.071E+00 4.461E-08           8.297E-04                               [    34]
     1 1.282E+00 6.481E-06  (  -8)   8.594E-03   2.857E-01  <  3458-    2|  12> [  19 -  61]
     1 1.331E+00 2.486E-08           4.892E-04                               [    33]
     1 1.361E+00 9.770E-12           1.972E-02                               [     0]
     1 1.398E+00 9.049E-12           1.621E-02                               [     0]
     1 1.435E+00 8.836E-12           1.572E-02                               [     0]
     1 1.474E+00 8.574E-12           1.482E-02                               [     0]

You mentioned "I suspect we didn't add support for multiple topology blocks with streams. " Could you please help us double check that? To facilitate your debugging efforts, I have made a tarball with the test case in it (downloadable via the link below). https://drive.google.com/file/d/1AibsskiPTMbSINrzIDeDHTWbZYFTAhGw/view?usp=sharing If you prefer to test on viznode, the test case can be found at /users/fang/annularFlow/mixedMeshTest/1-1-Chef/4-1-Chef/phastaChefTest

You can find the case details in README:

  1 The folder contains
  2 mdsMesh_bz2/            : the mds mesh for phastaChef run
  3 4-procs_case/           : the initial solution to migrate
  4 geom.smd                : Simmetrix model
  5 geom_nat.x_t            : native ParaSolid model
  6 solver.inp              : customized PHASTA input parameters
  7 input.config            : default PHASTA input parameters
  8 samAdaptLoop_t7_p1.inp  : customized Chef input parameters
  9 *.pht                   : PHASTA config file for PV viz
 10 meshSz.pvsm             : PV state file
 11 
 12 The PHASTA branch used  : next-LSorig+pinning
 13 The Chef repo           : latest on github
 14 The SimModSuite version : 14.0-180813dev

Thank you so much for the help.

cwsmith commented 6 years ago

the phasta and phastaChef driver being used:

$ cat run_phastaChef.sh  # on the phasta viz node
mpirun -np 4 ~/phastaChef-develop/phastaChef-ThetaDamBreak/build/chefPhastaICLoop_sam_adapt_sz 4 samAdaptLoop_t7_p1.inp 2>&1 | tee phastaChef.log

$ cat run_phasta.sh
mpirun -np 4 /users/fang/git_phasta-dev-CMake/build-next-viz-phastaChefSupport/bin/phastaIC.exe 

The phasta next-LSorig+pinning branch has a few local commits in your build. I assume these are required?

+cwsmith@jumpgate: /users/fang/git_phasta-dev-CMake/phasta-next (next-LSorig+pinning)$ git log --decorate --oneline -10
590f8b0 (HEAD, next-LSorig+pinning) Put back the subroutine get_phasic_vol
162aa99 Saving phasic volumes into vhist.dat
d626845 (origin/next-LSorig+pinning) removing the write of restart at the start of stream based adaptivity. Also return flowDiag computation to standard form which does not reproduce serial exactly as can be useful for debugging
cwsmith commented 6 years ago

When I build and run the stack with GCC 4.4.5 on a Debian Squeeze SCOREC system (avatar) execution fails with an memory error in the call to AsiGMR. Note, all processes hit the same error.

Would you please run under valgrind on the viz nodes?

Below are my environment and build script for phastaChef:

$ cat envDebianSqueeze.sh 
module load cmake/latest pumi git
$ cat configPhastaChefSqueeze.sh 
#!/bin/bash
flags='-O2 -g -Wall -Wextra'
cmake /lore/cwsmith/killme/phastachef-jun/phastaChef \
  -DCMAKE_C_COMPILER=mpicc \
  -DCMAKE_CXX_COMPILER=mpicxx \
  -DCMAKE_Fortran_COMPILER=gfortran \
  -DCMAKE_C_FLAGS="${flags}" \
  -DCMAKE_CXX_FLAGS="${flags}" \
  -DCMAKE_Fortran_FLAGS="${flags}" \
  -DSCOREC_PREFIX=$PUMI_INSTALL_DIR \
  -DPHASTA_SRC_DIR=/lore/cwsmith/killme/phastachef-jun/phasta-next \
  -DPHASTA_INCOMPRESSIBLE=ON \
  -DPHASTA_COMPRESSIBLE=ON \
  -DPHASTA_USE_PETSC=OFF \
  -DPHASTA_USE_SVLS=ON \
  -DPHASTA_USE_LESLIB=ON \
  -DPHASTA_TESTING=ON \
  -DLESLIB=/users/cwsmith/develop/libLes/libles_gcc.a \
  -DCASES=/lore/cwsmith/cdash/phastaChefTests

The relevant parts of the valgrind output from one process is below.

==28815== Invalid read of size 8                                                                                                                                                                                                                                                                     
==28815==    at 0x437600: input_fform(phSolver::Input&) (input_fform.cc:558)                                                                                                                                                                                                                         
==28815==    by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)                                                                                                                                                                                                               
==28815==    by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)                                                                                                                                                                                                                                  
==28815==  Address 0x16bf36e8 is 0 bytes after a block of size 24 alloc'd                                                                                                                                                                                                                            
==28815==    at 0x4C2695A: operator new(unsigned long) (vg_replace_malloc.c:298)                                                                                                                                                                                                                     
==28815==    by 0x4463DF: std::vector<double, std::allocator<double> >::operator=(std::vector<double, std::allocator<double> > const&) (new_allocator.h:89)                                                                                                                                          
==28815==    by 0x43509B: input_fform(phSolver::Input&) (input_fform.cc:275)                                                                                                                                                                                                                         
==28815==    by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)                                                                                                                                                                                                               
==28815==    by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)                                                                                                                                                                                                                                  
==28815==                                                                                                                                                                                                                                                                                            
==28815== Invalid read of size 8                                                                                                                                                                                                                                                                     
==28815==    at 0x43760A: input_fform(phSolver::Input&) (input_fform.cc:559)                                                                                                                                                                                                                         
==28815==    by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)                                                                                                                                                                                                               
==28815==    by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)                                                                                                                                                                                                                                  
==28815==  Address 0x16bf36f0 is 8 bytes after a block of size 24 alloc'd                                                                                                                                                                                                                            
==28815==    at 0x4C2695A: operator new(unsigned long) (vg_replace_malloc.c:298)                                                                                                                                                                                                                     
==28815==    by 0x4463DF: std::vector<double, std::allocator<double> >::operator=(std::vector<double, std::allocator<double> > const&) (new_allocator.h:89)                                                                                                                                          
==28815==    by 0x43509B: input_fform(phSolver::Input&) (input_fform.cc:275)                                                                                                                                                                                                                         
==28815==    by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)                                                                                                                                                                                                               
==28815==    by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)                                                                                                                                                                                                                                  
==28815==                                                                                                                                                                                                                                                                                            
==28815== Invalid read of size 8                                                                                                                                                                                                                                                                     
==28815==    at 0x437613: input_fform(phSolver::Input&) (input_fform.cc:560)                                                                                                                                                                                                                         
==28815==    by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)                                                                                                                                                                                                               
==28815==    by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)                                                                                                                                                                                                                                  
==28815==  Address 0x16bf36f8 is 16 bytes after a block of size 24 alloc'd                                                                                                                                                                                                                           
==28815==    at 0x4C2695A: operator new(unsigned long) (vg_replace_malloc.c:298)                                                                                                                                                                                                                     
==28815==    by 0x4463DF: std::vector<double, std::allocator<double> >::operator=(std::vector<double, std::allocator<double> > const&) (new_allocator.h:89)                                                                                                                                          
==28815==    by 0x43509B: input_fform(phSolver::Input&) (input_fform.cc:275)                                                                                                                                                                                                                         
==28815==    by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)                                                                                                                                                                                                               
==28815==    by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)                                                                                                                                                                                                                                  
==28815==                                                                                                                                                                                                                                                                                            
==28815== Invalid read of size 8                                                                                                                                                                                                                                                                     
==28815==    at 0x4376E8: input_fform(phSolver::Input&) (input_fform.cc:566)                                                                                                                                                                                                                         
==28815==    by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)                                                                                                                                                                                                               
==28815==    by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)                                                                                                                                                                                                                                  
==28815==  Address 0x16bf36e8 is 0 bytes after a block of size 24 alloc'd                                                                                                                                                                                                                            
==28815==    at 0x4C2695A: operator new(unsigned long) (vg_replace_malloc.c:298)                                                                                                                                                                                                                     
==28815==    by 0x4463DF: std::vector<double, std::allocator<double> >::operator=(std::vector<double, std::allocator<double> > const&) (new_allocator.h:89)                                                                                                                                          
==28815==    by 0x43509B: input_fform(phSolver::Input&) (input_fform.cc:275)                                                                                                                                                                                                                         
==28815==    by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)                                                                                                                                                                                                               
==28815==    by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)                                                                                                                                                                                                                                  
==28815==                                                                                                                                                                                                                                                                                            
==28815== Invalid read of size 8                                                                                                                                                                                                                                                                     
==28815==    at 0x4376F2: input_fform(phSolver::Input&) (input_fform.cc:567)                                                                                                                                                                                                                         
==28815==    by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)                                                                                                                                                                                                               
==28815==    by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)                                                                                                                                                                                                                                  
==28815==  Address 0x16bf36f0 is 8 bytes after a block of size 24 alloc'd                                                                                                                                                                                                                            
==28815==    at 0x4C2695A: operator new(unsigned long) (vg_replace_malloc.c:298)                                                                                                                                                                                                                     
==28815==    by 0x4463DF: std::vector<double, std::allocator<double> >::operator=(std::vector<double, std::allocator<double> > const&) (new_allocator.h:89)                                                                                                                                          
==28815==    by 0x43509B: input_fform(phSolver::Input&) (input_fform.cc:275)                                                                                                                                                                                                                         
==28815==    by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)                                                                                                                                                                                                               
==28815==    by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)                                                                                                                                                                                                                                  
==28815==                                                                                                                                                                                                                                                                                            
==28815== Invalid read of size 8                                                                                                                                                                                                                                                                     
==28815==    at 0x4376FB: input_fform(phSolver::Input&) (input_fform.cc:568)                                                                                                                                                                                                                         
==28815==    by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)                                                                                                                                                                                                               
==28815==    by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)                                                                                                                                                                                                                                  
==28815==  Address 0x16bf36f8 is 16 bytes after a block of size 24 alloc'd                                                                                                                                                                                                                           
==28815==    at 0x4C2695A: operator new(unsigned long) (vg_replace_malloc.c:298)                                                                                                                                                                                                                     
==28815==    by 0x4463DF: std::vector<double, std::allocator<double> >::operator=(std::vector<double, std::allocator<double> > const&) (new_allocator.h:89)                                                                                                                                          
==28815==    by 0x43509B: input_fform(phSolver::Input&) (input_fform.cc:275)                                                                                                                                                                                                                         
==28815==    by 0x424668: (anonymous namespace)::run(phSolver::Input&) (phasta.cc:108)                                                                                                                                                                                                               
==28815==    by 0x41F098: main (chef_phasta_sam_adaptLoop_sz.cc:76)     

==28815== Warning: silly arg (-6832023353530777600) to malloc()                                                                                                                                                                                                                                      
==28815== Invalid free() / delete / delete[] / realloc()                                                                                                                                                                                                                                             
==28815==    at 0x4C2578E: free (vg_replace_malloc.c:446)                                                                                                                                                                                                                                            
==28815==    by 0x10E5571A: free_mem (in /lib/libc-2.11.3.so)                                                                                                                                                                                                                                        
==28815==    by 0x10E552B1: __libc_freeres (in /lib/libc-2.11.3.so)                                                                                                                                                                                                                                  
==28815==    by 0x4A206BD: _vgnU_freeres (vg_preloaded.c:62)                                                                                                                                                                                                                                         
==28815==    by 0x10D7131D: __run_exit_handlers (exit.c:93)                                                                                                                                                                                                                                          
==28815==    by 0x10D713C4: exit (exit.c:100)                                                                                                                                                                                                                                                        
==28815==    by 0x102BBACB: ??? (in /usr/lib/libgfortran.so.3.0.0)                                                                                                                                                                                                                                   
==28815==    by 0x102BBF7B: _gfortran_os_error (in /usr/lib/libgfortran.so.3.0.0)                                                                                                                                                                                                                    
==28815==    by 0x102BC21E: ??? (in /usr/lib/libgfortran.so.3.0.0)                                                                                                                                                                                                                                   
==28815==    by 0x103427B2: ??? (in /usr/lib/libgfortran.so.3.0.0)                                                                                                                                                                                                                                   
==28815==    by 0x4EAEBE: elmgmr_ (elmgmr.f:261)                                                                                                                                                                                                                                                     
==28815==    by 0x4AA0CA: solflow_ (solfar.f:144)                                                                                                                                                                                                                                                    
==28815==  Address 0x404a7a8 is not stack'd, malloc'd or (recently) free'd
hapfang commented 6 years ago

Hi Cameron, sorry for the lag of reply in this issue. This is not longer urgent as we are pushing the phastaChef runs with BL adapt freezing. However, it would be very nice to have phastaChef also work on two-phase flow cases with mixed mesh. I've just redone a quick test, and the issue is reproducible for the redistancing after phastaChef adapt. The local commit in my phasta-next repo contains bug fixes from the recent work of Ken's and mine, which are probably not directly related to the reported issue.

As for the valgrind results, could you please the command you used to generate the valgrind error report you showed above? I am not familiar with valgrind.