Open charleskawczynski opened 12 months ago
The high domain top in dyamond and other aquaplanet simulation is not realistic for baroclinic wave initial conditions, so that may cause some problems. Maybe you can keep z_max at 30000 but use 63 layers. If you want a faster turnaround I think 30 days would be good enough to check whether the simulation is stable. Having said that, it should be faster than dyamond, as there is no radiation.
Also, just note that the maximum timestep of moist held suarez and aquaplanet is in general 2 times smaller than that of moist baroclinic wave. See the current longrun pipeline (using ARS, central difference, and no limiter) for example.
@cmbengue, @dennisYatunin, @tapios, we've estimated the maximum timestep in #2542, can we close this?
Where are the results? We need the matrix of maximum allowable timestep and time to solution for ARS vs SSP, FCT vs CD, and possibly maximum number of Newton iteration. It needs to include a stable configuration with FCT.
These are the current results:
dts tested ∈ [50, ..., 300] (units: seconds)
| stable | unstable
ARS_nolim_CD | 150 | 160
SSP_nolim_FCT | none | all
SSP_lim_FCT | none | all
SSP_lim_CD | none | all
SSP_nolim_CD | none | all
SSP_lim_CD 3 iters | 60 | 80
SSP_nolim_CD 3 iters | 100 | 120
ARS_nolim_FCT (old) | 80 | 90
ARS_nolim_FCT (new) | 100 | 120
ARS_lim_FCT | 80 | 90
ARS_lim_CD | 120 | 150
I'll close out the second part of #2510, and then update the FCT cases since CD cases should not change, since changes so far have not changed CD-only behavior AFAIK.
Are the FCT cases using 1-moment microphysics? If not, then the second part of 2510 will not have any effect on them, since it will only change the results for simulations with passive tracers (q_rai
, q_sno
, etc.).
Thanks @charleskawczynski. Let's add the ARS_nolim_FCT to the longrun pipeline (maybe as an experimental longrun in the CPU pipeline, if you think it's sensitive to small changes in the numerics). And yes, the second part of 2510 should not have any effect, as there is no passive tracer in the current tests.
What is the SSP baseline without FCT (i.e., central differences in the vertical)?
I just started one with an without limiters here:
I'll update the table once they finish.
Thanks @charleskawczynski. Let's add the ARS_nolim_FCT to the longrun pipeline (maybe as an experimental longrun in the CPU pipeline, if you think it's sensitive to small changes in the numerics).
Sounds good, for now I'll add this at the largest stable dt, does that sound okay?
Thanks @charleskawczynski. Let's add the ARS_nolim_FCT to the longrun pipeline (maybe as an experimental longrun in the CPU pipeline, if you think it's sensitive to small changes in the numerics).
Sounds good, for now I'll add this at the largest stable dt, does that sound okay?
Yes, sounds good to me.
We need the matrix of maximum allowable timestep and time to solution for ARS vs SSP, FCT vs CD, and possibly maximum number of Newton iteration. It needs to include a stable configuration with FCT.
Does it need the maximum number of Newton iterations? What determines whether it's needed or not?
Regarding a stable configuration with FCT, should this be a separate issue being that SSP doesn't seem to be stable with FCT? I'd like to keep the scope of this issue narrow so that it can be closed in a finite time window.
Just checked some old notes. SSP may need more than 1 newton iteration, even for CD. see #1440 #1441. I'm fine with separating the issue into ARS and SSP, to keep the scope narrow.
In the Gardner et al. paper , they found they needed multiple Newton iterations for SSP (but only one for ARS).
SSP with CD should be stable with multiple Newton iterations. That's what Gardner et al. found. I'd like to know a baseline timestep for this.
If SSP IMEX is always unstable with FCT, we can table it for now (but document results in an issue). It must mean that there's something wrong with the implementation of SSP with limiters. In that case, we can use ARS for now and revisit this later (e.g., when @OsKnoth is visiting).
I've added SSP_lim_CD
and SSP_nolim_CD
to the table above. SSP seems to be unconditionally unstable. I'll retry both with more newton iterations.
It turns out that SSP is unstable even for 3 newton iterations
It turns out that SSP is unstable even for 3 newton iterations
Some of the jobs are stable, the build failed just because the artifact file size is too large to upload. e.g. https://buildkite.com/clima/climaatmos-ci/builds/16221#018d47cf-326f-4a81-aea7-775ae00ea867. I'm fixing it.
We need to estimate the maximum allowable timestep for:
nh_poly = 3
(Nq = 4
)h_elem = 30
z_elem = 63
Is the target resolution (which I grabbed from the dyamond job) okay? Also, how long should this be stable for to declare success? I've put 100 days for now, which will take ~1.5 hours to run assuming we can simulate at 10 SYPD, so this will need to be a long run. Based on the latest dyamond results (https://github.com/CliMA/ClimaAtmos.jl/issues/2314#issuecomment-1801136086), we're currently at 0.468 SYPD. So, at the moment, we should be able to finish 100 days in ~13 hours on the A100. Hopefully this will dramatically improve once we figure out what's wrong.
This issue is connected to the Update Limiters milestone. We can't reference ClimaCore milestones in ClimaAtmos, but I'd like to run these simulation in ClimaAtmos CI.