cloudy-astrophysics / bug-tracker-migration-test

Trial run for importing the nublado.org Trac tickets as GitHub issues
0 stars 0 forks source link

pdr abort at r9134 with gcc 4.8 (trac #284) #286

Closed cloudy-bot closed 4 years ago

cloudy-bot commented 10 years ago

reported by: @CloudyLex

 PROBLEM  ConvFail 19, Temp not converged itr 1 zone 618 fnzone 619.79 Te=4.6602e+01 Htot=6.997e-21 Ctot=7.539e-21 rel err=-7.744e-02 rel tol:5.000e-03
 PROBLEM  ConvFail 20, Temp not converged itr 1 zone 619 fnzone 620.82 Te=4.6602e+01 Htot=6.982e-21 Ctot=7.552e-21 rel err=-8.157e-02 rel tol:5.000e-03
 Stop due to excessive convergence failures - there have been 20 so far. 
 This limit can be reset with the FAILURES command.
 ConvFail sets lgAbort since nTotalFailures=20 is >= LimFail=20
 This limit can be reset with the FAILURES command.
 ####619  Te:4.660E+01 Hden:1.000E+05 Ne:1.565E+00 R:1.000E+30 R-R0:9.750E+16 dR:8.334E+13 NTR:183 Htot:6.982E-21 T912: 1.85e+04###
 Hydrogen      4.39e-04 1.98e-08 H+o/Hden 4.39e-04 1.44e-17 H-    H2 5.00e-01 3.73e-12 H2+ HeH+ 2.13e-17 Ho+ ColD 2.92e+21 1.46e+16
 Helium        1.00e+00 1.56e-07 6.02e-14 HeI 2s3S 3.60e-14 Comp H,C 1.77e-27 4.82e-29 Fill Fac 1.00e+00 Gam1/tot 7.45e-01

Migrated from https://www.nublado.org/ticket/284

{
    "status": "closed",
    "changetime": "2019-07-29T13:55:12Z",
    "_ts": "1564408512375963",
    "description": "{{{\n PROBLEM  ConvFail 19, Temp not converged itr 1 zone 618 fnzone 619.79 Te=4.6602e+01 Htot=6.997e-21 Ctot=7.539e-21 rel err=-7.744e-02 rel tol:5.000e-03\n PROBLEM  ConvFail 20, Temp not converged itr 1 zone 619 fnzone 620.82 Te=4.6602e+01 Htot=6.982e-21 Ctot=7.552e-21 rel err=-8.157e-02 rel tol:5.000e-03\n Stop due to excessive convergence failures - there have been 20 so far. \n This limit can be reset with the FAILURES command.\n ConvFail sets lgAbort since nTotalFailures=20 is >= LimFail=20\n This limit can be reset with the FAILURES command.\n ####619  Te:4.660E+01 Hden:1.000E+05 Ne:1.565E+00 R:1.000E+30 R-R0:9.750E+16 dR:8.334E+13 NTR:183 Htot:6.982E-21 T912: 1.85e+04###\n Hydrogen      4.39e-04 1.98e-08 H+o/Hden 4.39e-04 1.44e-17 H-    H2 5.00e-01 3.73e-12 H2+ HeH+ 2.13e-17 Ho+ ColD 2.92e+21 1.46e+16\n Helium        1.00e+00 1.56e-07 6.02e-14 HeI 2s3S 3.60e-14 Comp H,C 1.77e-27 4.82e-29 Fill Fac 1.00e+00 Gam1/tot 7.45e-01\n\n}}}",
    "reporter": "gary",
    "cc": "",
    "resolution": "fixed",
    "time": "2014-05-09T22:22:13Z",
    "component": "thermal convergence",
    "summary": "pdr abort at r9134 with gcc 4.8",
    "priority": "major",
    "keywords": "",
    "version": "trunk",
    "milestone": "no milestone",
    "owner": "nobody",
    "type": "defect - convergence"
}
cloudy-bot commented 10 years ago

@CloudyLex commented:

                       * TABLE ISM LOG 5.000000                                                          *
                       * HDEN=5.000000 LOG                                                               *
                       * extinguish 24, leakage = 0                                                      *
                       * stop temperature off                                                            *
                       * stop AV 10                                                                      *
                       * cosmic rays background                                                          *
                       * abundances ISM                                                                  *
                       * atom H2                                                                         *
                       * set H2 Jura rate -16.22                                                         *
cloudy-bot commented 10 years ago

@peter-van-hoof-noaccount changed milestone from "" to "no milestone"

cloudy-bot commented 10 years ago

Confirmed with the trunk at r12636.

The cause of the temperature failures is a discontinuous jump in heating(0,8) at 46.601092060747 K. This is the heating due to collisions within X of H2. Each time the heating jumps up or down, there is a reevaluation of the H2 levels which seems to be jumping back and forth between a solution with high and low heating. The temperature at the discontinuity corresponds to an energy of 32.38937 cm^-1^ exact (with 8 trailing zeros). Not sure what that means though...

to:

1552054188244995

Confirmed with the trunk at r12636.

The cause of the temperature failures is a discontinuous jump in heating(0,8) at 46.601092060747 K. This is the heating due to collisions within X of H2. Each time the heating jumps up or down, there is a reevaluation of the H2 levels which seems to be jumping back and forth between a solution with high and low heating (with the high value 25% higher than the low value).

In the low heating state (T just above threshold) states0.Pop() = 17847, states1.Pop() = 18850.1

In the high heating state (T just below threshold) states0.Pop() = 29997.8, states1.Pop() = 4791.82

The temperature at the discontinuity corresponds to an energy of 32.38937 cm^-1^ exact (with 8 trailing zeros). Not sure what that means though...

 PROBLEM  ConvFail 19, Temp not converged itr 1 zone 618 fnzone 619.79 Te=4.6602e+01 Htot=6.997e-21 Ctot=7.539e-21 rel err=-7.744e-02 rel tol:5.000e-03
 PROBLEM  ConvFail 20, Temp not converged itr 1 zone 619 fnzone 620.82 Te=4.6602e+01 Htot=6.982e-21 Ctot=7.552e-21 rel err=-8.157e-02 rel tol:5.000e-03
 Stop due to excessive convergence failures - there have been 20 so far. 
 This limit can be reset with the FAILURES command.
 ConvFail sets lgAbort since nTotalFailures=20 is >= LimFail=20
 This limit can be reset with the FAILURES command.
 ####619  Te:4.660E+01 Hden:1.000E+05 Ne:1.565E+00 R:1.000E+30 R-R0:9.750E+16 dR:8.334E+13 NTR:183 Htot:6.982E-21 T912: 1.85e+04###
 Hydrogen      4.39e-04 1.98e-08 H+o/Hden 4.39e-04 1.44e-17 H-    H2 5.00e-01 3.73e-12 H2+ HeH+ 2.13e-17 Ho+ ColD 2.92e+21 1.46e+16
 Helium        1.00e+00 1.56e-07 6.02e-14 HeI 2s3S 3.60e-14 Comp H,C 1.77e-27 4.82e-29 Fill Fac 1.00e+00 Gam1/tot 7.45e-01

to:

 PROBLEM  ConvFail 19, Temp not converged itr 1 zone 618 fnzone 619.79 Te=4.6602e+01 Htot=6.997e-21 Ctot=7.539e-21 rel err=-7.744e-02 rel tol:5.000e-03
 PROBLEM  ConvFail 20, Temp not converged itr 1 zone 619 fnzone 620.82 Te=4.6602e+01 Htot=6.982e-21 Ctot=7.552e-21 rel err=-8.157e-02 rel tol:5.000e-03
 Stop due to excessive convergence failures - there have been 20 so far. 
 This limit can be reset with the FAILURES command.
 ConvFail sets lgAbort since nTotalFailures=20 is >= LimFail=20
 This limit can be reset with the FAILURES command.
 ####619  Te:4.660E+01 Hden:1.000E+05 Ne:1.565E+00 R:1.000E+30 R-R0:9.750E+16 dR:8.334E+13 NTR:183 Htot:6.982E-21 T912: 1.85e+04###
 Hydrogen      4.39e-04 1.98e-08 H+o/Hden 4.39e-04 1.44e-17 H-    H2 5.00e-01 3.73e-12 H2+ HeH+ 2.13e-17 Ho+ ColD 2.92e+21 1.46e+16
 Helium        1.00e+00 1.56e-07 6.02e-14 HeI 2s3S 3.60e-14 Comp H,C 1.77e-27 4.82e-29 Fill Fac 1.00e+00 Gam1/tot 7.45e-01
cloudy-bot commented 10 years ago

@peter-van-hoof-noaccount commented:

The root cause of the failures appears to be an underflow in the Boltzmann factors. At low temperatures only levels 0 and 1 have a significant population and they determine heating(0,8). An important contribution to these levels (influencing the ortho/para ratio) comes from highly excited states in X, presumably through fluorescence. One these levels is level number 100. At 46.6 K the Boltzmann factors from ground are extremely low and may underflow to zero. This creates problems when converting a downward to an upward collisional rate by dividing two Boltzmann factors. If one of these Boltzmann factors underflowed to zero, the upward rate is also set to zero, which then has a significant knock-on effect on the population of the levels 0 and 1, which then leads to the problems outlined above.

The proposed fix is to rewrite the way the upward rate is calculated by calling dsexp() with the energy difference between the two levels rather than dividing the Boltzmann factors. This change needs to applied to source:trunk/source/mole_h2.cpp@12849#L2068.

cloudy-bot commented 10 years ago

@peter-van-hoof-noaccount commented:

The patch proposed in comment 4 fixed the problems in the attached test case, but resulted in serious problems in the slow/h2_orion_hii_pdr.in test case. That has two temperature failures, lots of warnings about negative level populations and lots of big botches. Investigating the convergence failures reveals very similar problems to those discussed above. The solution is to replace more instances of dividing Boltzmann factors with a call to dsexp() with the energy difference between the two levels. With that patch the test case in this report is clean, and the test suite shows only a handful of minor botches which seem a plausible result of the more accurate calculations.

cloudy-bot commented 10 years ago

Patch discussed in comment 5. Attachment: fix_pr284.patch

cloudy-bot commented 10 years ago

A slightly optimized version of the patch discussed in comment 5 has been submitted to the trunk in r12864 and to c17_branch in r12865.