Closed indra098124 closed 8 months ago
Hi @indra098124 -- could you try with the inputs files in those directories and see if that works for you? Here https://ccse.lbl.gov/pub/RegressionTesting1/ERF/ is our nightly regression test suite -- all of these should "just work" if you try them-- maybe also try some of these as well so we can rule out issues, then we can see about this particular problem.
Are you running these tests locally on a mac?
Thank you @asalmgren and @AMLattanzi for looking into this. @AMLattanzi yes, I am running these locally on Mac. @asalmgren I can run ABL cases (that are also included in nightly tests) with no problem.
Set amrex.fpe_trap_invalid = 0
in the input files, which turns off some runtime error checking. The Apple Clang compilers sometimes perform optimizations that cause the AMReX checks for divide by zero and similar errors to spuriously fail (conditional branches that don't get used and involve a divide by zero may be still be evaluated). These optimizations aren't performed in debug mode, so if needed you can also run with amrex.fpe_trap_invalid = 1
if you compile with DEBUG = TRUE
.
@baperry2 -- that's really good to know -- could you add that to the docs somewhere?!
Thanks @baperry2, I was not aware of this. I tried that but it did not help. I also tried to run this test on a Linux machine and I get an error "erroneous arithmetic operation" . Looking at Backtrace it appears that the error originates in MOST calculation "Source/BoundaryConditions/MOSTAverage.H:143:56"
Here is the code snippet where it fails. for (int n = 0; n < interp_comp; n++) interp_vals[n] = sx_lo[0]sx_lo[1]sx_lo[2]interp_array(i-1, j-1, k-1,n) + sx_lo[0]sx_lo[1]sx_hi[2]interp_array(i-1, j-1, k ,n) + sx_lo[0]sx_hi[1]sx_lo[2]interp_array(i-1, j , k-1,n) + sx_lo[0]sx_hi[1]sx_hi[2]interp_array(i-1, j , k ,n) + sx_hi[0]sx_lo[1]sx_lo[2]interp_array(i , j-1, k-1,n) + sx_hi[0]sx_lo[1]sx_hi[2]interp_array(i , j-1, k ,n) + sx_hi[0]sx_hi[1]sx_lo[2]interp_array(i , j , k-1,n) + sx_hi[0]sx_hi[1]sx_hi[2]interp_array(i , j , k ,n); }
@asalmgren will do, even though there appears to be more going on here, I definitely learned about the spurious FPEs on Macs the hard way and it would be good to have the information out there more.
@indra098124 - I tried again and see the same thing as you. For Witch of Agnesi, I see a spurious FPE that resolves with amrex.fpe_trap_invalid = 0
when running with inputs
, but the same error as you when running with inputs_most_test
, which appears to be a real error
@indra098124 Thank you for sharing the issue with inputs_most_test
. The problem had to do with Theta_prim
variable not having its ghost cells filled yet and the interpolation routine (where your backtrace points to) had to access that data. The following PR 1455 ran successfully in debug mode on my local machine with single and multiple cores. Please let me know if you have further issues.
Thank you @AMLattanzi . I modified my copy to have IntVect ng = Theta_prim[lev]->nGrowVect(); in ERF.cpp and in ERF_Advance.cpp, still failing for me. I will try the version from PR.
@indra098124 Yes it should fail still with that revision. The creation of the MOST class and the calls to the MOST averaging needed to be moved later after the ghost cells were populated by FillPatch. If you see the issue arise, or a new issue, with the current development (e9bcaa0
) let me know.
@AMLattanzi unfortunately, it is still failing for me with the latest version. I tried debug version as well. With debug I get the following error (on Mac and on Linux).
amrex::Abort::1:: (127,-1,-1,0) is out of bound (125:258,-3:10,0:63,0:0) !!! SIGABRT amrex::Abort::0:: (117,1,-1,0) is out of bound (-3:130,-3:10,0:63,0:0) !!! SIGABRT
I tried running realclean and also a fresh download.
@AMLattanzi and @asalmgren there are other cases as well that are failing for me. I am not sure if I am doing something wrong.
Thank you!
I believe we didn’t mean to build with USE_POISSON_SOLVE on. If you set that to false does it build ok?
Thank you for all the great feedback! We need to do a better job of making sure the jnputs files in the repo work correctly
Ann Almgren Senior Scientist; Dept. Head, Applied Mathematics Pronouns: she/her/hers
On Sun, Feb 25, 2024 at 1:23 PM indra098124 @.***> wrote:
@AMLattanzi https://github.com/AMLattanzi and @asalmgren https://github.com/asalmgren there are other cases as well that are failing for me as well. I am not sure if I am doing something wrong.
- ABL/inputs.write -> The input filed needed prob.T_0 = 300.0, after that it worked.
- ABL/inputs.read -> This has been giving segfault. Backtrace points to if (input_bndry_planes && m_r2d->ingested_velocity()) in ERF_init_bcs.cpp:86). Debug or Assertion don't tell anything more.
- ABL_input_sounding does not compile. I just needed input_sounding that put me on track on finding the issue with this code compilation. This error is related to "USE_POISSON_SOLVE = TRUE". It gives an error /TI_headers.H:270:30: error: 'Vector' does not name a type 270 | const Vectoramrex::Real d_rayleigh_ptrs_at_lev); I realized that it is do with USE_POISSON_SOLVE = TRUE. I think it should be amrex::Vector. There was another error about use_rayleigh_damping not being declared which might be a typo as other places I find it is referenced as solverChoice.use_rayleigh_damping. At TI_no_substep_fun.H:133:13 the code complains that incompressible is not declared. Lastly, At TI_slow_rhs_fun.H:357:25: I get an error: cannot convert 'std::unique_ptramrex::MultiFab' to 'const amrex::MultiFab' erf_slow_rhs_inc(level, nrk, slow_dt. I could use input_sounding when I disable poisson_solve.
Thank you!
— Reply to this email directly, view it on GitHub https://github.com/erf-model/ERF/issues/1453#issuecomment-1963065699, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACRE6YVUYY3YDP2HIW6T47TYVOTTVAVCNFSM6AAAAABDV747CWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRTGA3DKNRZHE . You are receiving this because you were mentioned.Message ID: @.***>
Thank you @asalmgren and thank you ERF development team for making the software available open source. Yes, after disabling the poisson solver, I can build and run this.
Last thing I am figuring out is to use boundary input.
@indra098124 sounds like things are alright on this front? Are we good to close this particular issue?
I believe the inputs.write
and inputs.read
should work once PR 1461 goes through.
@AMLattanzi thanks for following up. I am not sure, but the most with terrain still fails for me with the following error?
amrex::Abort::1:: (127,-1,-1,0) is out of bound (125:258,-3:10,0:63,0:0) !!! SIGABRT amrex::Abort::0:: (117,1,-1,0) is out of bound (-3:130,-3:10,0:63,0:0) !!! SIGABRT
I am not sure. May I confirm if you were able to run terrain3d_Hemisphere successfully?
Ah, I have not tested hemisphere with MOST! Let me give that a go and I can either follow up with the results or create a PR to alleviate the issue. Thanks for clarifying.
@indra098124 I believe I have corrected the issue with MOST and the 3d hemisphere in PR 1465. Thank you again for bringing these issues to our attention, we greatly appreciate the feedback.
Thank you @AMLattanzi for your help.
@AMLattanzi after the new fix, the inputs_most_test in ABL seems to be broken. I find that if used erf.most.average_policy = 0, the code diverges at first time step with the error "0::Assertion `cell_data(i,j,k,RhoTheta_comp) > 0.' failed, file "../../Source/TimeIntegration/ERF_slow_rhs_pre.cpp", line 566" . most_average_policy =1 works fine. Would you mind having a look?
Many thanks
Additionally, looks like there is some issue with MOST with surface temperature. It always gives SIGILL Invalid, privileged, or ill-formed instruction. For e.g. see GABLS1 case.
@indra098124 The issue with the hemisphere should be corrected in PR 1468. The salient problem was that the turbulent viscosity was 0 for the given initialization; this is inconsistent with the MOST BC and the limiting we did with 1e-16 was not sufficient for stability. I also added an option for small perturbations in the IC to give finite strain and thus non-zero turbulent viscosity with Smagorinsky (the fluctuations seem to dissipate quickly). This ran for planar and local average for 10 steps.
With respect to the GABLS case, I am unable to replicate that issue. The instruction error you mention sounds like the mac issue Bruce explained. I have yet to see that error on a Linux machine with ERF. Perhaps try in DEBUG mode.
Thanks @AMLattanzi . This PR seems to have fixed the other issues (GABLS and ABLMost). I can see the ABLMost regression test ran successfully (https://ccse.lbl.gov/pub/RegressionTesting1/ERF/) while it was failing earlier today. Also thank you for explaining what was wrong.
Many thanks
@indra098124 -- are we good to close this issue?
Thank you @AMLattanzi. Yes @asalmgren we can close this.
Hi there, I tried most test provided in terrain3d_Hemisphere and WitchOfAgnesi. Both of these tests are failing for me. Are they expected to run from the initial condition defined in prob or we should run it without most first? I am using the latest version of the code and getting "SIGILL Invalid, privileged, or ill-formed instruction" error with these tests.
Many thanks for developing the code and answering my question.