AMReX-Astro / Castro

Castro (Compressible Astrophysics): An adaptive mesh, astrophysical compressible (radiation-, magneto-) hydrodynamics simulation code for massively parallel CPU and GPU architectures.
http://amrex-astro.github.io/Castro
Other
293 stars 99 forks source link

VODE failure in Detonation with self-consistent NSE #2768

Open yut23 opened 4 months ago

yut23 commented 4 months ago

I was testing stuff on my workstation, and ran into this error (which Zhi was able to reproduce on groot, but not on bender).

This change in Microphysics fixes the integration failure, but I have no idea why. I was getting an infinity when dividing by k_B, so I moved it after dividing by T_in (>1) and multiplying by MeV2erg (<1) to try and avoid the overflow. That shouldn't matter though, since it would be clamped to 500 anyway.

diff --git a/nse_solver/nse_solver.H b/nse_solver/nse_solver.H
index 3b6578f4..84cfdbe4 100644
--- a/nse_solver/nse_solver.H
+++ b/nse_solver/nse_solver.H
@@ -135,7 +135,7 @@ void apply_nse_exponent(T& nse_state) {
       exponent = amrex::min(500.0_rt,
                             (zion[n] * nse_state.mu_p + (aion[n] - zion[n]) *
                              nse_state.mu_n - u_c + network::bion(n+1)) /
-                            C::k_B / T_in * C::Legacy::MeV2erg);
+                            T_in * C::Legacy::MeV2erg / C::k_B);

       nse_state.xn[n] *= std::exp(exponent);
   }

I built with make USE_MPI=FALSE USE_NSE_NET=TRUE USE_SIMPLIFIED_SDC=TRUE and ran ./Castro1d.gnu.SMPLSDC.ex inputs-det-x.nse_net.

Output from last step

``` ...estimated hydro-limited timestep at level 0: 3.222580504e-07 ...hydro-limited CFL timestep constrained at (i) = (193) Castro::estTimeStep (hydro-limited) at level 0: estdt = 3.222580504e-07 [Level 0 step 56] ADVANCE at time 9.951155886e-06 with dt = 3.222580504e-07 Beginning subcycle 1 starting at time 9.951155886e-06 with dt = 3.222580504e-07 Estimated number of subcycles remaining: 1 Beginning SDC iteration 1 of 2. ... Entering construct_ctu_hydro_source() on level 0 ... Leaving construct_ctu_hydro_source() on level 0 Castro::construct_ctu_hydro_source() time = 0.001451384 on level 0 ... Entering burner on level 0 and doing full timestep of burning. burn entered NSE during integration (after 60 steps), zone = (184, 0, 0) recovering burn failure in NSE, zone = (184, 0, 0) burn entered NSE during integration (after 60 steps), zone = (185, 0, 0) recovering burn failure in NSE, zone = (185, 0, 0) burn entered NSE during integration (after 59 steps), zone = (186, 0, 0) recovering burn failure in NSE, zone = (186, 0, 0) burn entered NSE during integration (after 19 steps), zone = (187, 0, 0) recovering burn failure in NSE, zone = (187, 0, 0) DVODE: error test failed repeatedly or with abs(H) = HMIN [ERROR] integration failed in net istate = -2 zone = (194, 0, 0) time = 2.025443356e-07 dt = 3.22258050406865e-07 dens start = 212399378.6385347 temp start = 3981719770.328372 rhoe start = 1.904100715097615e+26 xn start = 0.0007153127934977045 0.812844273308981 0.002962516354105738 9.128545590383544e-08 0.05272940826057532 0.002210605450790213 0.001337929768998882 0.001078099952268411 0.006333973696701896 0.01574582073890216 0.01309927771736488 0.020138575954406 0.01535275490213217 0.04506819421271176 0.001062676621846327 0.0006935265398149839 0.001921922549013335 0.006705039892433533 dens current = 216440008.8108908 temp current = 7511468610.831602 xn current = 0.0006959642445108953 0.5907176390225434 -0.001561313237238796 -1.444856999715345e-09 -0.009999999986059249 1.226987861599856e-07 1.299721875453768e-06 1.530013282592945e-05 0.0004690583581444161 0.002557895007678554 0.02103677770907929 0.003968387633176645 0.00451799109766617 0.007302822611138544 0.001761289015035479 0.01192144675640181 0.07894543844077183 0.2876514946522588 A(rho) = 19949361507225.72 A(rho e) = 6.987402724044161e+31 A(rho X_k) = 13585002171.45476 -5463760884135.337 -169546081756.1912 -2280381.087711663 -3832671099895.636 -160642110030.9785 -97049770872.25642 -77427775252.36699 -416523621923.3657 -913454101324.0404 1029506938445.772 -1102818950874.122 -702520593397.8518 -2607860536593.861 85364403657.76393 1052176817164.868 7168432931476.591 26144573220746.41 rho = 216440008.811 T = 7511468610.83 xn = 0.000695964244511 0.590717639023 -0.00156131323724 -1.44485699972e-09 -0.00999999998606 1.2269878616e-07 1.29972187545e-06 1.53001328259e-05 0.000469058358144 0.00255789500768 0.0210367777091 0.00396838763318 0.00451799109767 0.00730282261114 0.00176128901504 0.0119214467564 0.0789454384408 0.287651494652 (i, j, k) = 194 0 0 y[SRHO] = 216440008.811 y[SEINT] = 2.81045443531e+26 y[SFS:] = 148912.641199 126393452.715 0 0 0 26.253360662 278.096207995 3273.70724542 100362.510774 547302.400243 4501153.83899 849099.775484 966696.19535 1562555.27753 376855.853175 2550783.51713 16891634.6516 61547621.3777 ydot_a[SRHO] = 1.99493615072e+13 ydot_a[SEINT] = 6.98740272404e+31 ydot_a[SFS:] = 13585002171.5 -5.46376088414e+12 -169546081756 -2280381.08771 -3.8326710999e+12 -160642110031 -97049770872.3 -77427775252.4 -416523621923 -913454101324 1.02950693845e+12 -1.10281895087e+12 -702520593398 -2.60786053659e+12 85364403657.8 1.05217681716e+12 7.16843293148e+12 2.61445732207e+13 0 amrex::Error::0::unsuccessful burn !!! SIGABRT See Backtrace.0 file for details ```

Castro --describe

``` ============================================================================== Castro Build Information ============================================================================== build date: 2024-03-08 18:59:30.544102 build machine: Linux xrb 6.2.8-200.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Mar 22 19:11:02 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux build dir: /home/eric/dev/Castro/Exec/science/Detonation AMReX dir: /home/eric/dev/amrex COMP: gnu COMP version: 12.2.1 C++ compiler: g++ C++ flags: -Werror=return-type -gdwarf-4 -O3 -std=c++17 -pthread -DBL_NO_FORT -DAMREX_GPU_MAX_THREADS=0 -DBL_SPACEDIM=1 -DAMREX_SPACEDIM=1 -DBL_FORT_USE_UNDERSCORE -DAMREX_FORT_USE_UNDERSCORE -DBL_Linux -DAMREX_Linux -DAMREX_DIMENSION_AGNOSTIC -DNDEBUG -DAMREX_NO_PROBINIT -DSPONGE -DREACTIONS -DSDC_EVOLVE_ENERGY -DSIMPLIFIED_SDC -DSDC -DSCREEN_METHOD=SCREEN_METHOD_chabrier1998 -DNSE_NET -DNSE -DNETWORK_HAS_CXX_IMPLEMENTATION -DALLOW_JACOBIAN_CACHING -DSCREENING -DNEUTRINOS -DNAUX_NET=0 -Itmp_build_dir/s/1d.gnu.SMPLSDC.EXE -I. -I/home/eric/dev/amrex/Src/Base -I/home/eric/dev/amrex/Src/Base/Parser -I/home/eric/dev/amrex/Src/AmrCore -I/home/eric/dev/amrex/Src/Amr -I/home/eric/dev/amrex/Src/Boundary -I/home/eric/dev/Microphysics/util -I/home/eric/dev/Microphysics/util/gcem/include -I/home/eric/dev/Microphysics/integration/VODE -I/home/eric/dev/Microphysics/integration/utils -I/home/eric/dev/Microphysics/integration -I/home/eric/dev/Microphysics/screening -I/home/eric/dev/Microphysics/neutrinos -I./ -I/home/eric/dev/Castro/Source/driver -I/home/eric/dev/Castro/Source/hydro -I/home/eric/dev/Castro/Source/problems -I/home/eric/dev/Castro/Source/sources -I/home/eric/dev/Castro/Source/reactions -I/home/eric/dev/Microphysics/EOS -I/home/eric/dev/Microphysics/EOS/helmholtz -I/home/eric/dev/Microphysics/networks/subch_base -I/home/eric/dev/Microphysics/EOS -I/home/eric/dev/Microphysics/networks -I/home/eric/dev/Microphysics/interfaces -I/home/eric/dev/Microphysics/constants -I/home/eric/dev/Microphysics/nse_solver -I/home/eric/dev/Microphysics/util/hybrj -Itmp_build_dir/castro_sources/1d.gnu.SMPLSDC.EXE -I/home/eric/dev/amrex/Tools/C_scripts Fortran comp: gfortran Fortran flags: -g -O3 -ffree-line-length-none -fno-range-check -fno-second-underscore -fimplicit-none Link flags: -L. Libraries: EOS: /home/eric/dev/Microphysics/EOS/helmholtz NETWORK: /home/eric/dev/Microphysics/networks/subch_base INTEGRATOR: VODE SCREENING: chabrier1998 Castro git describe: 24.03-1-g25cc6d8f8 AMReX git describe: 24.03-5-gba95d4cd8 Microphysics git describe: 24.03-2-g9dd9dbb6 ```

zhichen3 commented 4 months ago

what was the value you're getting after moving k_B to the end?

yut23 commented 4 months ago

I get 1.0326211552283207e+297, with mu_p = 2.2538475573742366e+296. I also tried avoiding the overflow altogether by replacing the amrex::min call with a comparison against 500.0_rt * C::k_B * T_in / C::Legacy::MeV2erg, but the integration still failed:

      // prevent an overflow on exp by capping the exponent -- we hope that a subsequent
      // iteration will make it happy again

      Real numer = (zion[n] * nse_state.mu_p + (aion[n] - zion[n]) *
                    nse_state.mu_n - u_c + network::bion(n+1));
      if (numer > 500.0_rt * T_in * C::k_B / C::Legacy::MeV2erg) {
          exponent = 500.0_rt;
      } else {
          exponent = numer / C::k_B / T_in * C::Legacy::MeV2erg;
          // uncommenting this fixes the integration failure
          //exponent = numer / T_in * C::Legacy::MeV2erg / C::k_B;
      }
zhichen3 commented 4 months ago

I think the problem is with nse state mass fraction in the previous timestep. If it overflows we get e500, but we get e297, which means a different NSE composition, and its the wrong one and could be an unrealistic one. So vode in the next timestep is having hard time integrating.

So I think we need to be careful about the overflowing issue, and most importantly mu_p and mu_n need to be capped during hybrj because it certainly cannot be e296, which I think is also why it thinks e500 is a valid exponent that satisfies the constraint equations due to the absurd chemical potentials.

zhichen3 commented 4 months ago

hmm, never mind. I think it never really returns those answers as the solution.

zingale commented 3 weeks ago

is this still an issue?

yut23 commented 3 weeks ago

I managed to reproduce it with burn_cell_sdc on the same checkout of Microphysics, but I had to manually set burn_state.y[9] = 3344402.5410965709 in burn_cell.H. The crash only shows up with that exact value, which is not accessible by doing rho * xn[9]: rho * xn[9] is 1 ULP too low, and rho * nextafter(xn[9], xn[9]+1) is 1 ULP too high.

It no longer crashes on development, and git bisect says it was fixed in https://github.com/AMReX-Astro/Microphysics/commit/f361dc38c97d5a2c1c93edeacbaf3fa84054d431. I'm pretty sure whatever is causing this is still present, it's just harder to find.

yut23 commented 3 weeks ago

Here's the inputs file I used:

unit_test.density = 212399378.63853467
unit_test.temperature = 3981719770.3283715
unit_test.rhoe = 1.904100715097615e+26
unit_test.tmax = 3.22258050406865e-07
unit_test.nsteps = 1

unit_test.X1 = 0.0007153127934977045
unit_test.X2 = 0.812844273308981
unit_test.X3 = 0.0029625163541057383
unit_test.X4 = 9.128545590383544e-08
unit_test.X5 = 0.052729408260575324
unit_test.X6 = 0.0022106054507902134
unit_test.X7 = 0.0013379297689988816
unit_test.X8 = 0.0010780999522684105
unit_test.X9 = 0.0063339736967018955
unit_test.X10 = 0.015745820738902152  # this is the problematic value
unit_test.X11 = 0.013099277717364884
unit_test.X12 = 0.020138575954406004
unit_test.X13 = 0.015352754902132172
unit_test.X14 = 0.04506819421271176
unit_test.X15 = 0.0010626766218463275  # this one is also different from Castro
unit_test.X16 = 0.0006935265398149839
unit_test.X17 = 0.0019219225490133352
unit_test.X18 = 0.006705039892433533

unit_test.mu_p = -7.732686016863321
unit_test.mu_n = -9.806752357425662

unit_test.Adv_rho = 19949361507225.723
unit_test.Adv_rhoe = 6.987402724044161e+31

unit_test.Adv_X1 = 13585002171.454762
unit_test.Adv_X2 = -5463760884135.337
unit_test.Adv_X3 = -169546081756.1912
unit_test.Adv_X4 = -2280381.087711663
unit_test.Adv_X5 = -3832671099895.6357
unit_test.Adv_X6 = -160642110030.9785
unit_test.Adv_X7 = -97049770872.25642
unit_test.Adv_X8 = -77427775252.36699
unit_test.Adv_X9 = -416523621923.3657
unit_test.Adv_X10 = -913454101324.0404
unit_test.Adv_X11 = 1029506938445.7719
unit_test.Adv_X12 = -1102818950874.1218
unit_test.Adv_X13 = -702520593397.8518
unit_test.Adv_X14 = -2607860536593.861
unit_test.Adv_X15 = 85364403657.76393
unit_test.Adv_X16 = 1052176817164.8684
unit_test.Adv_X17 = 7168432931476.591
unit_test.Adv_X18 = 26144573220746.41

integrator.call_eos_in_rhs = 1

integrator.rtol_spec = 1.e-6
integrator.atol_spec = 1.e-6
integrator.rtol_enuc = 1.e-6

integrator.jacobian = 2
integrator.renormalize_abundances = 0
integrator.ode_max_steps = 1500000

nse.nse_molar_independent = 0
nse.nse_dx_independent = 1
nse.ase_tol = 0.1
nse.nse_skip_molar = 0