danieljprice / phantom

Phantom Smoothed Particle Hydrodynamics and Magnetohydrodynamics code
https://phantomsph.github.io
Other
109 stars 243 forks source link

MPI produces different results than openMP #37

Closed jameswurster closed 3 years ago

jameswurster commented 4 years ago

To test MPI, I modelled the gravitational collapse of a magnetised sphere. My runs were pure openMP (mislabelled as 'hybrid' on the line plots) and hybrid MPI+openMP (labelled as MPI). The attached graphs show the kinetic and potential energies over the early stages of the collapse, and the density at dump 10 as plotted by splash (red = openMP. black = hybrid). Clearly the results are not the same, suggesting there is a bug in the MPI implementation. jets_02ekin jets_05epot rho10

github-actions[bot] commented 4 years ago

Congratulations on raising your first issue on the phantom tracker!

danieljprice commented 4 years ago

would it be possible to post the files needed to reproduce this issue?

jameswurster commented 4 years ago

These simulations were produced using the included .in and .setup files (Due to a small oversight, they did not use the normal defaults for the collapse simulations, but both tests used the same values.)

The setup block is SETUP=jet All tests were run locally on my MacBook Pro, compiled using SYSTEM=gfortran

For openMP: export OMP_NUM_THREADS=6 ./phantom jet.in

For hybrid (compiling with MPI=yes): export OMP_NUM_THREADS=3 mpiexec -np 2 ./phantom jet.in

jet.setup:

# input file for sphere-in-box setup routines

# units
           dist_unit =    1.0d16cm    ! distance unit (e.g. au)
           mass_unit =      solarm    ! mass unit (e.g. solarm)

# resolution
                  np =      300000    ! requested number of particles in sphere

# options for box
                xmin =      -8.000    ! x min
                xmax =       8.000    ! x max
                ymin =      -8.000    ! y min
                ymax =       8.000    ! y max
                zmin =      -8.000    ! z min
                zmax =       8.000    ! z max

# intended result
         form_binary =           F    ! the intent is to form a central binary

# options for sphere
       use_BE_sphere =           F    ! centrally condense as a BE sphere
            r_sphere =       4.000    ! radius of sphere in code units
    density_contrast =         30.    ! density contrast in code units
      totmass_sphere =       1.000    ! mass of sphere in code units
       cs_sphere_cgs =   2.189E+04    ! sound speed in sphere in cm/s
              angvel =   1.770E-13    ! angular velocity in rad/s
          masstoflux =       5.000    ! mass-to-magnetic flux ratio in units of critical value
          ang_Bomega =        180.    ! Angle (degrees) between B and rotation axis

jet.in:

# Runtime options file for Phantom, written 24/07/2020 10:35:38.8
# Options not present assume their default values
# This file is updated automatically after a full dump

# job name
             logfile =  jetomp01.log   ! file to which output is directed
            dumpfile =  jetomp_00000.tmp   ! dump file to start from

# options controlling run time and input/output
                tmax =         10.    ! end time
               dtmax =  0.0888576587635  ! time between dumps
                nmax =          -1    ! maximum number of timesteps (0=just get derivs and stop)
                nout =          -1    ! number of steps between dumps (-ve=ignore)
           nmaxdumps =           2    ! stop after n full dumps (-ve=ignore)
            twallmax =      000:00    ! maximum wall time (hhh:mm, 000:00=ignore)
           dtwallmax =      012:00    ! maximum wall time between dumps (hhh:mm, 000:00=ignore)
           nfulldump =          10    ! full dump every n dumps
            iverbose =           0    ! verboseness of log (-1=quiet 0=default 1=allsteps 2=debug 5=max)

# options controlling accuracy
              C_cour =       0.300    ! Courant number
             C_force =       0.250    ! dt_force number
                tolv =   1.000E-02    ! tolerance on v iterations in timestepping
               hfact =       1.200    ! h in units of particle spacing [h = hfact(m/rho)^(1/3)]
                tolh =   1.000E-04    ! tolerance on h-rho iterations
       tree_accuracy =       0.500    ! tree opening criterion (0.0-1.0)

# options controlling hydrodynamics, artificial dissipation
               alpha =       1.000    ! MINIMUM art. viscosity parameter
            alphamax =       1.000    ! MAXIMUM art. viscosity parameter
              alphaB =       1.000    ! art. resistivity parameter
         psidecayfac =       1.000    ! div B diffusion parameter
        overcleanfac =       1.000    ! factor to increase cleaning speed (decreases time step)
       hdivbbmax_max =       1.000    ! max factor to decrease cleaning timestep propto B/(h|divB|)
                beta =       2.000    ! beta viscosity
        avdecayconst =       0.100    ! decay time constant for viscosity switches

# options controlling damping
               idamp =           0    ! artificial damping of velocities (0=off, 1=constant, 2=star)

# options controlling equation of state
                ieos =           1    ! eqn of state (1=isoth;2=adiab;3=locally iso;8=barotropic)
                  mu =       2.381    ! mean molecular weight

# options controlling sink particles
       icreate_sinks =           0    ! allow automatic sink particle creation
     h_soft_sinksink =       0.000    ! softening length between sink particles
               f_acc =       0.800    ! particles < f_acc*h_acc accreted without checks

# options relating to external forces
      iexternalforce =           0    ! 1=star,2=coro,3=bina,4=prdr,5=toru,6=toys,7=exte,8=spir,9=Lens,10=dens,11=Eins,

# options controlling physical viscosity
           irealvisc =           0    ! physical viscosity type (0=none,1=const,2=Shakura/Sunyaev)
          shearparam =       0.100    ! magnitude of shear viscosity (irealvisc=1) or alpha_SS (irealvisc=2)
            bulkvisc =       0.000    ! magnitude of bulk viscosity

# options for injecting/removing particles
               rkill =      -1.000    ! deactivate particles outside this radius (<0 is off)
danieljprice commented 4 years ago

so one thing here is that the tree is just constructed differently with MPI compared to openMP, so it's possible it's just a slight difference in the gravity calculation?

jameswurster commented 4 years ago

These difference look big enough that I would suspect that they are not simply due to a difference in the tree build. Rhomax is similar for both, and rho_ave is included below (with corrected legend). Also, Bmax is similar for both, but h divB / B is really bad for the hybrid run (both the maximum and average values). I wonder if this is related to some of the current build issues.... jets_11rhoave jets_25hdivBBmax jets_26hdivBBave

jameswurster commented 3 years ago

An additional issue is easily reproducible by running the nonidealmhd test. For the first test, different dt_courant values are produced depending on MPI=yes vs MPI=no. Both values are included in the test script so that all tests pass. I have yet to look into this.

EDIT: 23 March 2021:
This is no longer an issue once we removed the race condition when calculating divcurlB.

jameswurster commented 3 years ago

These issues remain even without the race condition when calculating divcurlB. Decreasing tree_accuracy help, but not nearly enough; it has trivial affect on the openmp-only model, by noticable affect on the hybrid model.

Note: The above plots used ieos=1, when I meant to use ieos=8. All comments are independent of ieos.

jameswurster commented 3 years ago

This does not appear to be an MHD issue, since hydro versions of these collapses yield similar conclusions.

Note: to run the first 15 steps takes 1.9min using openmp but 4.2min for the hybrid run; for the purely hydro collapses, it takes 64s using openmp but 2.3min for the hybrid run. I have not tested how well these scale in time for larger and longer simulations.

danieljprice commented 3 years ago

these differences are likely because particle waking is not implemented in MPI (#142), closing this issue as is likely a duplicate of #142