NOAA-SWPC / IPE

Ionosphere Plasmasphere Electrodynamics Model
GNU General Public License v3.0
6 stars 10 forks source link

WAM-IPE issues on Phase 3 #11

Closed akubaryk closed 4 years ago

akubaryk commented 5 years ago

Having done no testing myself, this is from @wanghj2015 in NOAA-SWPC/WAM-IPE#334:

When the 'no_netcdf' version run on phase 3 with debug mode, it caught the error:

forrtl: error (73): floating divide by zero:
NEMS.x 0000000000621DD1 module_med_swpc_m 3233 module_MEDIATOR_SWPC_methods.f90
NEMS.x 0000000000616874 module_med_swpc_m 2960 module_MEDIATOR_SWPC_methods.f90

Or this line in module_MEDIATOR_SWPC_methods.F90:

        data_curr=data_prev*exp((hgt_prev-hgt_curr)/H_avg)
wanghj2015 commented 5 years ago

For phase 3 issue, using Raffaele’s suggestion for now:

You may just skip that line until we settle on a cleaner solution:

if (H_avg > 0) then data_curr=data_prev*exp((hgt_prev-hgt_curr)/H_avg) else data_curr=data_prev endif

And cycled runs on phase 3 started.

akubaryk commented 5 years ago

Fix has been incorporated into ipe_refactor_dev_wam with the latest commit.

rmontuoro commented 5 years ago

This problem is caused by WAM fields being imported with 0 values at the first coupling time step, and it occurs only in legacy code (newer code includes a sanity check).

Regridding and interpolation/extrapolation should be skipped in the SWPC mediator for the first coupling time step, so the committed fix may not be necessary.

akubaryk commented 5 years ago

Apparently this only occurs when the GSM is compiled with debug flags, which aren't well supported. I'm inclined to revert ba7d9e9 and close this -- let me know if anybody has any thoughts, will go ahead with this plan later in the week if nobody objects.

wanghj2015 commented 5 years ago

Found that runs on phase produced 'NaN's in molecular ions. An example showing the zonal mean field of no+:

image

akubaryk commented 5 years ago

This isn't something specific to Phase 3, it's an issue with the WAM initial conditions. (On Phase 3) I swapped in some balanced WAM ICs we have that are labeled for 2015030100, changed the initialization date in the sigio/sfcio header to 2019060400, and after integrating for several hours, there are no NaNs in any of the IPE fields.

It's worth noting that there are no stability warnings from the GSM in my run, while they are present in abundance when starting with the DA-enabled WAM ICs. This strongly suggests that the imbalances created from the analysis increment are causing some issues within IPE. What the exact cause is needs further investigation.

wanghj2015 commented 5 years ago

@akubaryk It would be worthy to know what kind of "stability warnings"? Trying to figure it out. Would you mind to copy and paste some of them? Too many printouts to figure out what's trying to look for. Do you get 'NaN's also?

Could it be mismatch between WAM ICs and IPE ICs also? Do we need some new ICs for IPE that match the current season and solar activities?

These NaNs appear to be near the lower/inner flux-tubes?

akubaryk commented 5 years ago

No. The coupling is one-way, so IPE has nothing to do with the stability of WAM. I could use a September IC, too, and the result would be the same as long as the GSM portion is balanced.

grep for xvcn or nvcn in the output log.

I believe these warnings come from the vertical advection filter. If you look into NEMS/src/atmos/gsm/dyn/gfidi_hyb_gc.f (there are other gfidi* source files with these print statements, so without looking into it I'm not sure exactly which one it is), the print statement happens after a call to vcnhyb_gc (or vcnhyb_gc_h or vcnhyb_gchdp), and nvcn is listed in that subroutine as "integer number of points requiring filtering."

In my experiences looking at cases where the GSM crashes from instability (eventually throwing NaNs), these types of warnings become extremely common. The warnings certainly appear during storm-time runs from balanced ICs, but I think it is very, very unusual for them to appear in quiet time starting from balanced initial conditions.

akubaryk commented 5 years ago

A code sample from subroutine vcnhyb_gc

      do k=1,km-1
        do i=1,im
          zdm=abs(zint(i,k)-zint(i,k+1))
          zdm=min( zdm, abs(zint(i,k+1)-zint(i,k+2)) )
          vcn(i,k)=abs(zdot(i,k+1)*dt/zdm)*1.1
          lvcn(i)=lvcn(i).or.vcn(i,k).gt.1.0
          xvcn=max(xvcn,vcn(i,k))
        enddo
      enddo
      if(xvcn.gt.1.0) then
        do i=1,im
          if(lvcn(i)) then
            ivcn(nvcn+1)=i
            nvcn=nvcn+1
          endif
        enddo
      endif

Look at the assignment of vcn: it's basically checking if w*dt/layer_dz > 1, or vertical velocities are greater than dz at a given layer with a given timestep. So we're getting very significant vertical velocities from the time of initialization due to DA. That's bad.

wanghj2015 commented 5 years ago

@akubaryk Thanks for these info. I searched my printout/dayfile and didn't find a lot of these warning, e.g.,

[swpc.spacepara@v71a3 prwam.201908]$ grep nvcn wam2019060406gdasfcst1_1.dayfile|wc -l 31 [swpc.spacepara@v71a3 prwam.201908]$ grep nvcn wam2019060606gdasfcst1_0.dayfile|wc -l 139 [swpc.spacepara@v71a3 prwam.201908]$ grep nvcn wam2019060618gdasfcst1_0.dayfile|wc -l 8

Maybe because you don't use any initialization scheme, such as digital filtering (DF), you got a lot of warning. But in these runs, we use 'IAU'; and gfs would have used DF too. Also 3 min time stepping were used in these runs; will use 1 min for wam and 3 min for ipe.

And it would be helpful for IPE too if it is initialized appropriately.

Thanks for bring out these points.

akubaryk commented 5 years ago

Right, but I get zero nvcn warnings from balanced ICs, and the balanced ICs don't produce NaNs in IPE despite using the exact same IPE IC and input forcing parameters: literally the only difference is the GSM neutrals.

I'm happy to run with FHDFI != 0 to see if that helps, but surely you can agree that there's only one variable here: the neutrals. The primary difference between the balanced ICs and your ICs is the data assimilation component.

The timestep should not matter too too much, but I'm looking into a 1min/3min split right now.

wanghj2015 commented 5 years ago

The time step matter a lot: 3 min vs 1 min (both without initialization schemes):

[swpc.spacepara@v71a3 prwam.201909]$ grep nvcn wam2019060100gdasfcst1_1.dayfile|wc -l 4452 [swpc.spacepara@v71a3 prwam.201909]$ grep nvcn wam2019060100gdasfcst1_2.dayfile|wc -l 17

akubaryk commented 5 years ago

Are you still getting NaN's out of the fcst1_2.dayfile? There's a root issue here of which the stability warnings are only a symptom.

akubaryk commented 5 years ago

Using the 2019060400 WAM ICs, I ran with 1min WAM, 3min IPE, FHDFI=3, and while the nvcn errors have not appeared through five hours, the NaN's appear at 02Z IPE output just like my original test.

wanghj2015 commented 5 years ago

No 'NaN's were found in a 24-hr fcst on 'theia' for the same run using the same 2019060400 WAM ICs. [Note: this is a run with the empirical efield]. x2

[Update on 20190919: This run was actually using the faked (I guess) '2019060400' ics from: IC_DIR=/scratch3/NCEPDEV/swpc/noscrub/Adam.Kubaryk/ICS/WAM-IPE/2019060400]

wanghj2015 commented 5 years ago

The analysis incremental for siganl.gdas.2019060400: ncview dt ncview du ncview dv

gmillward commented 4 years ago

Molecular NaNs - seek and destroy - works:

no_nans

twfang commented 4 years ago

See updates in NOAA-SWPC/IPE#3