Alarming result differences using shorter timesteps (GFS_v16)

Included in this issue is a graphic to demonstrate the point I am trying to make.

Here is a comparison of a run I call Control which uses the exact same suite definition file and namelist unedited from ccpp-scm repo with GFS_v16 and the ARM-SGP case (about 27 days simulated) versus running with shorter timesteps and a different column_area size. The top panel of the graphic shows Control whereas the bottom panel shows Exp1 with the following list of 4 items changed

dynamics timestep reduced from 600 to 60s
microphysics timestep reduced from 150 to 60s
radiation timestep reduced from 3600 to 600s
column_area reduced from 2E9 (about 45km DX) to 1E7 (about 3km DX)

The rationale for these changes is a more fair comparison at HRRR resolution when switching from GFS physics to GSD physics. I was attempting to determine whether the large change of physics suite from GFS to GSD was primarily responsible for the large changes I am seeing.

The issue I am raising is two-fold. The quickly obvious plot of cloud water content clearly is exploding the frequency of low clouds. There may actually be nothing wrong with this. While the chnage might be a bit dramatic, the far larger issue I am seeing is the very dramatic rise in temperature, particularly in the middle atmosphere of nearly 5 degrees. The solid lines with warm colors (reds/oranges) are each 5C interval whereas the solid light blue line centered near 600 hPa is 0C and each dashed line in cool colors (blues/purples) are each 5C below zero. There is also a "compaction" of temperature lines aloft up near 100-200 hPa.

I am very alarmed by this. Remember, the time period is middle June to middle July (1997) and I expect very hot weather in central Okllahoma, but approaching 0C at 500hPa is extremely rare anywhere in the USA. For context, the truly massive heat bubble of Summer 2021 had core values of 500hPa temperature around -2C.

I do not believe shorter timesteps that seem very valid should be producing a warming such as this. The problem is even worse when switching to GSD physics suite. Can anyone offer some explanations? tstep_comparison

@gthompsnWRF I think that there are a few things that could explain the behavior that you're seeing, written in my expected order of importance:

By reducing the column area, you're triggering the scale-awareness of the deep convection scheme, effectively shutting the deep convective scheme down for a case that should produce deep convection. For a 3D model with a dycore that can (sort of) resolve deep convection at the scale you chose, this is no problem. But for an SCM without a dycore and no means by which to "take over" deep convection, you're then counting on the rest of the physics to compensate for the fact that there is no deep convection scheme redistributing heat/moisture/etc. while the forcing is attempting to destabilize the column. For your experiment to work, you would need to basically account for the deep convection within your forcing (which represents how a dycore would be changing the column).
Going along with the first point, the forcing is derived assuming a certain horizontal scale, roughly synonymous with the horizontal area of the observational field campaign domain. IMO, it is not scientifically valid to apply the same forcing for horizontal grid sizes an order of magnitude apart.
Given the timestamps on your plots, it looks like you're using time period "X" of the ARM SGP Summer 1997 case, which is 29 days. IMO, this is simply too long to simulate in a SCM without reinitializing or configuring your forcing to be responsive to the modeled state. As soon as the simulation accumulates enough error from the physics to put the state outside of the intended meteorological regime, it is effectively useless beyond that point. How good is the UFS forecast after running for 29 days without DA or reinitializing?
Alluding to the third point, this case uses the "revealed" forcing method for T and q, i.e. the specification of the total advective tendencies in one term. In the SCM literature, this forcing is known to produce unrealistic states after a fairly short time, including runaway temperature increases, etc., which is why most SCM cases in the literature prefer to specify horizontal advective tendencies and a vertical velocity to calculate the vertical advective term using the vertical derivative of the modeled state. That method at least has a chance at responding better to errors in the physics. But, IMO, short of nudging profiles back to observations or some reference profile, no forcing method will give good fidelity in a SCM for longer than a few days, depending on the meteorological regime.

To test my hypothesis (that the small column area is the main culprit), I ran the arm_summer_sgp_1997_X case with 1) the control GFS_v16 setup, 2) with your reduced timestep and 3) with the original timestep and the smaller area. Simulations 1 and 2 were reasonably close to each other (but still probably bunk due to points 3 and 4 above) and simulation 3 had the same problems as your Exp1.

@grantfirl I appreciate the points in the reply, but I am still very highly suspicious of a coding error. Regarding your point#1 above, I had already considered the column_area change. Therefore, I have also run a case with same settings as my Exp1 but shut down both deep and shallow convection - after all, I'm expecting a 3-km (ish) simulation using my microphysics to handle convection at that scale. Below is the same figure for my Exp2 for GFS_v16 and the problem looks even worse than before.

In fact, note how practically every single time there is cloud water (again this is GFDL microphysics, not mine), there is quite a temperature jump. At times, as soon as the cloud disappears, the temp drops like a cliff, or when the cloud doesn't evaporate, the rise in temperature may slowly decay, but, over the duration, the temp is constantly rising. With this in mind, it's as if the temperature tendency variable isn't reset to zero every timestep, but, rather it is summed from prior timesteps and just keeps building.

Your point#2 makes little sense to me. It's as if we have this CCPP-SCM tool, but it can only be used as each case was pre-designed and nothing changed is worthwhile. Is a 3x3km column in central OK, not representative of a 45x45km column? I'm getting more than a little disillusioned that this tool is valid for developers to insert/test new physics if we always have to use the scales given? How is it that the heat generated (probably from microphysics) dissipates in the control experiment with coarser resolution and cannot in the finer resolution?

I struggle with your point#3, because I would certainly trust that a 29-day UFS simulation of real world weather would not have an ever-increasing temperature. Obviously, I wouldn't very much trust the evolution of simulated weather to match the true weather, but that's a different point.

Regarding point#4, I can blatantly admit that SCM forcing is something I have near zero experience. But if the dependability of forecast results is effectively trash after a couple days forecast, why even have the ARM-SGP case with "X" setting for 29 days?

I'm not at all trying to be argumentative. I am trying to use the tool to ascertain physics coding changes, but results such as these give me immense pause. It's beginning to look like changing codes can only be tested with full-scale UFS model tests. I suppose I can try another case like ASTEX or something, but I am greatly struggling here with how I should interpret results. We both know it doesn't rain for 20+ straight days in central OK in mid-summer - which is what happens in the ARM-SGP case ccpp_scm_mp_exp2 1 .

@gthompsnWRF Regarding the inclusion of time period X of the ARM SGP case, it was included for "completeness". Due to my points 3 and 4 above, I doubt anyone with much SCM experience would trust anything coming out of a 29 day integration with revealed forcing. Please see the attached description of the case (Cederwall et al.), which includes descriptions of the time periods. It says that time period X is included to "study SCM performance in terms of model drift over a long period", which IMO, could only really be useful by using "full" physics that does not get deactivated by the scale-awareness and the chosen horizontal gridscale. Another use for period X data is for users to develop a shorter case using a different subset of data than those already defined. Time period A is the one that I've seen in the literature for model intercomparison purposes, and I would stick with that.

Your point that the UFS (probably) wouldn't have a runaway temperature after 29 days is disingenuous because it would not be artificially being forced at every column to produce such an effect.

You say that you're expecting microphysics to handle all convection at the 3 km scale. Like I said, for a 3D model, perhaps that's valid because heating produced by microphysics would induce a dynamical response (vertical motion, divergence aloft, surface convergence, etc.) to redistribute that heating. In an SCM without a dycore (that is, in effect, being replaced by the forcing terms), that same heating produced by microphysics only provokes a response locally at N levels within column that is then acted upon by physics in subsequent physics calls. For the forcing setup in this case, the advective T tendencies are the same (there is no response), regardless of what the physics is doing. See the following quick/dirty plot of the T forcing: arm_sgp_period_X_T_forcing

I'm not at all surprised that your Exp2 is worse than Exp1. It's making the same mistake as experiment 1, only worse.

I don't think that the coding error for microphysics as you describe is possible in this case. The microphysics schemes in the UFS are being used in time-split mode and updating the state internally. To the extent tendencies are calculated from microphysics, they are merely diagnostic. Since the EXACT same physics code is being executed in the SCM and UFS, such a coding error would have to affect the UFS as well, and I'm not aware of any evidence for that.

Is a 3x3km column in central OK, not representative of a 45x45km column?

It 100% depends. For this case, with deep convection, I would say absolutely not. The 45x45km area is an average of 225 3x3km areas. Are you saying that in a convective environment every 3x3km has the same vertical motion? Surely not. If you wanted to develop forcing for every 3x3km in the 45x45km area, surely there would be a distribution of strengths of the heating/cooling induced by convection with some MUCH stronger and some much weaker, the average of which is the forcing for the 45x45km forcing in the case data files. I'm guessing that there would be a way to downscale the forcing that is valid for the 45x45km case using the original observations and some algorithm, but that would certainly take some digging in the literature and considerable unfunded/enlarged scope effort to accomplish.

Perhaps your expectations of what can be accomplished with the given set of cases are too high. The scope of their use is fairly narrow. One could imagine running a UFS simulation at 3km with the convection turned off and saving the advective tendencies (which would then include the dynamical response at that scale generated by the microphysical heating/cooling desired) for use in the SCM. That would be a more valid experiment, and we're working on this capability.

cederwall_1999b.pdf .

@grantfirl @gthompsnWRF Should this issue be moved to ccpp-scm?

IMO, this issue can be closed. The original issue was explained and I'm not sure that any code needs to change as a result of this. It's more about expectations of how to use an SCM in general and the provided cases.

NCAR / ccpp-scm

Alarming result differences using shorter timesteps (GFS_v16) #342