AMReX-Astro / MAESTROeX

A C++ low Mach number stellar hydrodynamics code
https://amrex-astro.github.io/MAESTROeX/
BSD 3-Clause "New" or "Revised" License
40 stars 22 forks source link

Urca Abundances don't sum to 1 on restart #314

Closed biboyd closed 2 years ago

biboyd commented 2 years ago

When restarting Urca from a checkpoint file, the run fails due to abundances not summing to 1.

Initializing from checkpoint ./chk0000200
Restart from checkpoint ./chk0000200
read CPU time: 57183666.11
inner sponge: r_sp      , r_tp      : 136812500, 162062500
outer sponge: r_sp_outer, r_tp_outer: 162062500, 164062500
Calling Evolve()
Call to estdt for level 0 gives dt_lev = 0.0009801217021
Minimum estdt over all levels = 0.0009801217021
Call to estdt at beginning of step 201 gives dt =0.0009801217021

Timestep 201 starts with TIME = 4.323252704 DT = 0.0009801217021

Cell Count:
Level 0, 1073741824 cells
inner sponge: r_sp      , r_tp      : 136812500, 162062500
outer sponge: r_sp_outer, r_tp_outer: 162062500, 164062500
<<< STEP 1 : react state >>>
amrex::Abort::112::ERROR: abundances do not sum to 1 !!!

This can be reproduced by running a 64^3 problem using the following inputs file and model file inputfiles.zip

Restarting from any of the checkpoint files 6 or later should produce the error

zingale commented 2 years ago

I can't seem to reproduce this. Can you tell me how many processors you used to see this? And are you doing MPI and OpenMP? or just OpenMP?

zingale commented 2 years ago

ah, okay, I can get it with chk000010

zingale commented 2 years ago

this is in the inputs file:

# increase tol so doesn't fail on restart                                                                                                                      
maestro.reaction_sum_tol = 1.e-7                                                                                                                               

is that something you added? or is this something that's been there a while?

biboyd commented 2 years ago

Oh I think that was something I added and was messing around with

biboyd commented 2 years ago

I think I found the problem. Looks like things were being saved in single point precision. The inputs files has:

# Write out plotfile data in single precision
fab.format = NATIVE_32

Commenting out this line solves any of the restart problems

zingale commented 2 years ago

oh good catch. That's interesting though, since that usually doesn't affect checkpoint files. Maybe we are doing something different in MAESTROeX?

doreenfan commented 2 years ago

This issue has been fixed in PR #316.