CliMA / ClimaAtmos.jl

ClimaAtmos.jl is a library for building atmospheric circulation models that is designed from the outset to leverage data assimilation and machine learning tools. We welcome contributions!
Apache License 2.0
83 stars 17 forks source link

Estimate maximum allowable timestep for moist baro wave #2344

Open charleskawczynski opened 12 months ago

charleskawczynski commented 12 months ago

We need to estimate the maximum allowable timestep for:

Is the target resolution (which I grabbed from the dyamond job) okay? Also, how long should this be stable for to declare success? I've put 100 days for now, which will take ~1.5 hours to run assuming we can simulate at 10 SYPD, so this will need to be a long run. Based on the latest dyamond results (https://github.com/CliMA/ClimaAtmos.jl/issues/2314#issuecomment-1801136086), we're currently at 0.468 SYPD. So, at the moment, we should be able to finish 100 days in ~13 hours on the A100. Hopefully this will dramatically improve once we figure out what's wrong.

This issue is connected to the Update Limiters milestone. We can't reference ClimaCore milestones in ClimaAtmos, but I'd like to run these simulation in ClimaAtmos CI.

szy21 commented 12 months ago

The high domain top in dyamond and other aquaplanet simulation is not realistic for baroclinic wave initial conditions, so that may cause some problems. Maybe you can keep z_max at 30000 but use 63 layers. If you want a faster turnaround I think 30 days would be good enough to check whether the simulation is stable. Having said that, it should be faster than dyamond, as there is no radiation.

szy21 commented 11 months ago

Also, just note that the maximum timestep of moist held suarez and aquaplanet is in general 2 times smaller than that of moist baroclinic wave. See the current longrun pipeline (using ARS, central difference, and no limiter) for example.

charleskawczynski commented 9 months ago

@cmbengue, @dennisYatunin, @tapios, we've estimated the maximum timestep in #2542, can we close this?

tapios commented 9 months ago

Where are the results? We need the matrix of maximum allowable timestep and time to solution for ARS vs SSP, FCT vs CD, and possibly maximum number of Newton iteration. It needs to include a stable configuration with FCT.

charleskawczynski commented 9 months ago

These are the current results:

dts tested ∈ [50, ..., 300] (units: seconds)
                     | stable  |   unstable
ARS_nolim_CD         | 150     |   160
SSP_nolim_FCT        |  none   |   all
SSP_lim_FCT          |  none   |   all
SSP_lim_CD           |  none   |   all
SSP_nolim_CD         |  none   |   all
SSP_lim_CD 3 iters   |  60     |   80
SSP_nolim_CD 3 iters |  100    |   120
ARS_nolim_FCT (old)  | 80      |   90
ARS_nolim_FCT (new)  | 100     |   120
ARS_lim_FCT          | 80      |   90
ARS_lim_CD           | 120     |   150

I'll close out the second part of #2510, and then update the FCT cases since CD cases should not change, since changes so far have not changed CD-only behavior AFAIK.

dennisYatunin commented 9 months ago

Are the FCT cases using 1-moment microphysics? If not, then the second part of 2510 will not have any effect on them, since it will only change the results for simulations with passive tracers (q_rai, q_sno, etc.).

szy21 commented 9 months ago

Thanks @charleskawczynski. Let's add the ARS_nolim_FCT to the longrun pipeline (maybe as an experimental longrun in the CPU pipeline, if you think it's sensitive to small changes in the numerics). And yes, the second part of 2510 should not have any effect, as there is no passive tracer in the current tests.

tapios commented 9 months ago

What is the SSP baseline without FCT (i.e., central differences in the vertical)?

charleskawczynski commented 9 months ago

I just started one with an without limiters here:

I'll update the table once they finish.

charleskawczynski commented 9 months ago

Thanks @charleskawczynski. Let's add the ARS_nolim_FCT to the longrun pipeline (maybe as an experimental longrun in the CPU pipeline, if you think it's sensitive to small changes in the numerics).

Sounds good, for now I'll add this at the largest stable dt, does that sound okay?

szy21 commented 9 months ago

Thanks @charleskawczynski. Let's add the ARS_nolim_FCT to the longrun pipeline (maybe as an experimental longrun in the CPU pipeline, if you think it's sensitive to small changes in the numerics).

Sounds good, for now I'll add this at the largest stable dt, does that sound okay?

Yes, sounds good to me.

charleskawczynski commented 9 months ago

We need the matrix of maximum allowable timestep and time to solution for ARS vs SSP, FCT vs CD, and possibly maximum number of Newton iteration. It needs to include a stable configuration with FCT.

Does it need the maximum number of Newton iterations? What determines whether it's needed or not?

Regarding a stable configuration with FCT, should this be a separate issue being that SSP doesn't seem to be stable with FCT? I'd like to keep the scope of this issue narrow so that it can be closed in a finite time window.

szy21 commented 9 months ago

Just checked some old notes. SSP may need more than 1 newton iteration, even for CD. see #1440 #1441. I'm fine with separating the issue into ARS and SSP, to keep the scope narrow.

tapios commented 9 months ago

In the Gardner et al. paper , they found they needed multiple Newton iterations for SSP (but only one for ARS).

SSP with CD should be stable with multiple Newton iterations. That's what Gardner et al. found. I'd like to know a baseline timestep for this.

If SSP IMEX is always unstable with FCT, we can table it for now (but document results in an issue). It must mean that there's something wrong with the implementation of SSP with limiters. In that case, we can use ARS for now and revisit this later (e.g., when @OsKnoth is visiting).

charleskawczynski commented 9 months ago

I've added SSP_lim_CD and SSP_nolim_CD to the table above. SSP seems to be unconditionally unstable. I'll retry both with more newton iterations.

charleskawczynski commented 9 months ago

It turns out that SSP is unstable even for 3 newton iterations

szy21 commented 9 months ago

It turns out that SSP is unstable even for 3 newton iterations

Some of the jobs are stable, the build failed just because the artifact file size is too large to upload. e.g. https://buildkite.com/clima/climaatmos-ci/builds/16221#018d47cf-326f-4a81-aea7-775ae00ea867. I'm fixing it.