The hyperspectral IR passive channel statistics behave differently due to modifications made for correlated observation error

emilyhcliu commented 1 year ago

The release/gfsda.v16 branch has been merged with the develop into gfsda.v16_merge branch.

PR #526 documents the merge

GSI single-cycle test using the merged branch gfsda.v16_merge found an issue with passive channels for hyperspectral sensors. The results are compared to the operational output.

The printout is identical for assimilated channels 16 and 38. The printout is very different for monitored channels 29, 32, and 35. Similar behavior is found for monitored cris-fsr_n20 and iasi_metop-b channels. fort.207 stats for assimilated channels are identical. fort.207 stats for monitored channels differ.

ops-2.2023012800

   99   16 iasi_metop-c      10134     55      1.380  -0.2640014  -0.0464082   0.4738062   0.4637597   0.4614318
  100   29 iasi_metop-c      10181      3     -0.810  -0.0019946  -0.0328516   0.2409105   0.3983508   0.3969939
  101   32 iasi_metop-c       9461      6     -0.750   0.0945998  -0.0240327   0.2880045   0.4025384   0.4018203
  102   35 iasi_metop-c      10184      0     -0.790   0.0156703  -0.0443345   0.2441537   0.3911598   0.3886392
  103   38 iasi_metop-c       9741     17      0.720   0.1351438  -0.0176393   0.4004059   0.3911234   0.3907254

gsi-merge-2.2023012800

   99   16 iasi_metop-c      10134     55      1.380  -0.2640014  -0.0464082   0.4738062   0.4637597   0.4614318
  100   29 iasi_metop-c       1926      3     -0.810   0.0564427   0.1078378   0.2517949   0.4072287   0.3926910
  101   32 iasi_metop-c       1916      6     -0.750   0.1743745   0.0741829   0.2535199   0.3776706   0.3703134
  102   35 iasi_metop-c       1926      0     -0.790   0.0608931   0.0879048   0.2445232   0.3914403   0.3814424
  103   38 iasi_metop-c       9741     17      0.720   0.1351438  -0.0176393   0.4004059   0.3911234   0.3907254

The above-mentioned features found in the passive channels for hyperspectral sensors are related to the correlated observation error changes from PR #443. Note that the release/gfsda.v16 does not have changes from PR #443

The following code snippets are the sections related to correlated observation errors from the gfsda.v16_merge and release/gfsda.v16, respectively.

setuprad.f90: section related to correlated observation error.

The following code snippet shows howvarinv is used in calculations related to correlated obs.
It seems to me that only the active channels (varinv > 0 and use flag is 1) are involved (check varinv0) The use of varinv0 is to make sure only active channels are involved.

gfsda.v16_merge

        diagadd=zero
        account_for_corr_obs = .false.
        iii=0
        varinv0=zero
        do ii=1,nchanl
           m=ich(ii)
           if (varinv(ii)>tiny_r_kind .and. iuse_rad(m)>=1) then
             iii=iii+1
             varinv0(ii)=varinv(ii)
             raterr2(ii)=error0(ii)**2*varinv0(ii)
             if (l_may_be_passive .and. .not. retrieval) then
               if(optconv > zero .and. (iasi .or. cris) .and. iinstr /= -1)then
                 asum=zero
                 do k=1,nsig
                   asum=asum+abs(jacobian(iqs+k,ii))*qs(k)
                 end do
                 diagadd(ii)=optconv*asum
               end if
             end if
           end if
        enddo
        err2 = one/error0**2
        tbc0=tbc
        tb_obs0=tb_obs
        wgtjo=varinv0
        if (l_may_be_passive .and. .not. retrieval) then
          if(iii>0 .and. iinstr.ne.-1)then
            chan_count=(iii*(iii+1))/2
            if(allocated(rsqrtinv))deallocate(rsqrtinv)
            if(allocated(rinvdiag))deallocate(rinvdiag)
            allocate(rsqrtinv(chan_count))
            allocate(rinvdiag(iii))
            rsqrtinv=zero
            rinvdiag=zero
            account_for_corr_obs = corr_adjust_jacobian(iinstr,nchanl,nsigradjac,ich,varinv0,diagadd,&
                                    tbc,tb_obs,err2,raterr2,wgtjo,jacobian,cor_opt,iii,rsqrtinv,rinvdiag)
            varinv=wgtjo
          endif
        endif

The following code snippet shows how varinv is used in calculations related to correlated obs.
It seems to me that all channels are involved. varinv0 here does not do anything.

gfsda.v16

        tbc0=tbc
        tb_obs0=tb_obs
        varinv0 = varinv
        raterr2 = zero
        err2 = one/error0**2
        wgtjo= varinv     ! weight used in Jo term
        account_for_corr_obs = .false.
        if (l_may_be_passive .and. .not. retrieval) then
          iii=0
          do ii=1,nchanl
             m=ich(ii)
             if (varinv(ii)>tiny_r_kind .and. iuse_rad(m)>=1) then
               iii=iii+1
               raterr2(ii)=error0(ii)**2*varinv(ii)
             endif
          enddo
          if(iii>0 .and. iinstr.ne.-1)then
            chan_count=(iii*(iii+1))/2
            allocate(rsqrtinv(chan_count))
            allocate(rinvdiag(iii))
            rsqrtinv=zero
            rinvdiag=zero
            account_for_corr_obs = corr_adjust_jacobian(iinstr,nchanl,nsigradjac,ich,varinv,&
                                               tbc,tb_obs,err2,raterr2,wgtjo,jacobian,cor_opt,iii,rsqrtinv,rinvdiag)
            varinv = wgtjo
          endif
        endif

emilyhcliu commented 1 year ago

Experimented with the following modification for gfsda.v16_merge: Look for emily_test for changes:

        diagadd=zero
        account_for_corr_obs = .false.
        iii=0
        varinv0=zero
        do ii=1,nchanl
           m=ich(ii)
           if (varinv(ii)>tiny_r_kind .and. iuse_rad(m)>=1) then
             iii=iii+1
             varinv0(ii)=varinv(ii)
             raterr2(ii)=error0(ii)**2*varinv0(ii)
             if (l_may_be_passive .and. .not. retrieval) then
               if(optconv > zero .and. (iasi .or. cris) .and. iinstr /= -1)then
                 asum=zero
                 do k=1,nsig
                   asum=asum+abs(jacobian(iqs+k,ii))*qs(k)
                 end do
                 diagadd(ii)=optconv*asum
               end if
             end if
           end if
        enddo
        err2 = one/error0**2
        tbc0=tbc
        tb_obs0=tb_obs
!       wgtjo=varinv0  !orig
        wgtjo=varinv   !emily_test
        if (l_may_be_passive .and. .not. retrieval) then
          if(iii>0 .and. iinstr.ne.-1)then
            chan_count=(iii*(iii+1))/2
            if(allocated(rsqrtinv))deallocate(rsqrtinv)
            if(allocated(rinvdiag))deallocate(rinvdiag)
            allocate(rsqrtinv(chan_count))
            allocate(rinvdiag(iii))
            rsqrtinv=zero
            rinvdiag=zero
       !    account_for_corr_obs = corr_adjust_jacobian(iinstr,nchanl,nsigradjac,ich,varinv0,diagadd,&    !orig
            account_for_corr_obs = corr_adjust_jacobian(iinstr,nchanl,nsigradjac,ich,varinv,diagadd,&     !emily_test
                                    tbc,tb_obs,err2,raterr2,wgtjo,jacobian,cor_opt,iii,rsqrtinv,rinvdiag)
            varinv=wgtjo
          endif
        endif

Results from the modified code for gfsda.v16_merge

Only the first 25 channels are shown below

   99   16 iasi_metop-c      10134     55      1.380  -0.2640014  -0.0464082   0.4738062   0.4637597   0.4614318
  100   29 iasi_metop-c      10181      3     -0.810  -0.0019946  -0.0328516   0.2409105   0.3983508   0.3969939
  101   32 iasi_metop-c       9461      6     -0.750   0.0945998  -0.0240327   0.2880045   0.4025384   0.4018203
  102   35 iasi_metop-c      10184      0     -0.790   0.0156703  -0.0443345   0.2441537   0.3911598   0.3886392
  103   38 iasi_metop-c       9741     17      0.720   0.1351438  -0.0176393   0.4004059   0.3911234   0.3907254
  104   41 iasi_metop-c      10184      0     -0.740   0.0664495  -0.0366581   0.2530276   0.3730187   0.3712131
  105   44 iasi_metop-c       9940      2     -0.680   0.1438601  -0.0177988   0.2949553   0.3693846   0.3689556
  106   47 iasi_metop-c      10184      0     -0.720   0.1235954  -0.0354098   0.2598330   0.3677800   0.3660714
  107   49 iasi_metop-c      10170     19      0.650   0.1848115  -0.0159772   0.4066348   0.3574698   0.3571126
  108   50 iasi_metop-c       9998      3     -0.650   0.1492998  -0.0058246   0.3152531   0.3650583   0.3650119
  109   51 iasi_metop-c       9852     19      0.650   0.2263103  -0.0112212   0.3756883   0.3519649   0.3517860
  110   53 iasi_metop-c      10183      1     -0.690   0.1517136  -0.0417373   0.2727302   0.3610617   0.3586413
  111   55 iasi_metop-c      10162     27      0.640   0.1991875  -0.0256447   0.3995409   0.3453456   0.3443921
  112   56 iasi_metop-c      10003      6     -0.640   0.1360557  -0.0122915   0.3035011   0.3526938   0.3524796
  113   57 iasi_metop-c       9799     16      0.650   0.2160133  -0.0105742   0.3801994   0.3491505   0.3489903
  114   59 iasi_metop-c      10167     22      0.670   0.2053565  -0.0309912   0.4516680   0.3529203   0.3515570
  115   61 iasi_metop-c      10166     23      0.620   0.2440017  -0.0264197   0.3836243   0.3370525   0.3360155
  116   62 iasi_metop-c      10163     11     -0.610   0.1605307  -0.0167305   0.3211786   0.3458156   0.3454107
  117   63 iasi_metop-c       9743     19      0.620   0.1918069  -0.0152163   0.3880877   0.3433142   0.3429768
  118   66 iasi_metop-c      10168     21      0.640   0.1235555  -0.0324460   0.4445170   0.3426555   0.3411158
  119   68 iasi_metop-c      10029      5     -0.590   0.1288132  -0.0088820   0.3383952   0.3433175   0.3432026
  120   70 iasi_metop-c       9893     14      0.760   0.0373880  -0.0027445   0.4393032   0.3605423   0.3605319
  121   72 iasi_metop-c      10147     42      1.220  -0.0569219  -0.0384008   0.4757747   0.4032983   0.4014660
  122   74 iasi_metop-c      10173     16      0.780   0.1424242  -0.0189679   0.3862381   0.3449180   0.3443960
  123   76 iasi_metop-c      10183      1     -0.640   0.2130095  -0.0143614   0.2760107   0.3363319   0.3360251

Compare to what's in the operational gfs:

   99   16 iasi_metop-c      10134     55      1.380  -0.2640014  -0.0464082   0.4738062   0.4637597   0.4614318
  100   29 iasi_metop-c      10181      3     -0.810  -0.0019946  -0.0328516   0.2409105   0.3983508   0.3969939
  101   32 iasi_metop-c       9461      6     -0.750   0.0945998  -0.0240327   0.2880045   0.4025384   0.4018203
  102   35 iasi_metop-c      10184      0     -0.790   0.0156703  -0.0443345   0.2441537   0.3911598   0.3886392
  103   38 iasi_metop-c       9741     17      0.720   0.1351438  -0.0176393   0.4004059   0.3911234   0.3907254
  104   41 iasi_metop-c      10184      0     -0.740   0.0664495  -0.0366581   0.2530276   0.3730187   0.3712131
  105   44 iasi_metop-c       9940      2     -0.680   0.1438601  -0.0177988   0.2949553   0.3693846   0.3689556
  106   47 iasi_metop-c      10184      0     -0.720   0.1235954  -0.0354098   0.2598330   0.3677800   0.3660714
  107   49 iasi_metop-c      10170     19      0.650   0.1848115  -0.0159772   0.4066348   0.3574698   0.3571126
  108   50 iasi_metop-c       9998      3     -0.650   0.1492998  -0.0058246   0.3152531   0.3650583   0.3650119
  109   51 iasi_metop-c       9852     19      0.650   0.2263103  -0.0112212   0.3756883   0.3519649   0.3517860
  110   53 iasi_metop-c      10183      1     -0.690   0.1517136  -0.0417373   0.2727302   0.3610617   0.3586413
  111   55 iasi_metop-c      10162     27      0.640   0.1991875  -0.0256447   0.3995409   0.3453456   0.3443921
  112   56 iasi_metop-c      10003      6     -0.640   0.1360557  -0.0122915   0.3035011   0.3526938   0.3524796
  113   57 iasi_metop-c       9799     16      0.650   0.2160133  -0.0105742   0.3801994   0.3491505   0.3489903
  114   59 iasi_metop-c      10167     22      0.670   0.2053565  -0.0309912   0.4516680   0.3529203   0.3515570
  115   61 iasi_metop-c      10166     23      0.620   0.2440017  -0.0264197   0.3836243   0.3370525   0.3360155
  116   62 iasi_metop-c      10163     11     -0.610   0.1605307  -0.0167305   0.3211786   0.3458156   0.3454107
  117   63 iasi_metop-c       9743     19      0.620   0.1918069  -0.0152163   0.3880877   0.3433142   0.3429768
  118   66 iasi_metop-c      10168     21      0.640   0.1235555  -0.0324460   0.4445170   0.3426555   0.3411158
  119   68 iasi_metop-c      10029      5     -0.590   0.1288132  -0.0088820   0.3383952   0.3433175   0.3432026
  120   70 iasi_metop-c       9893     14      0.760   0.0373880  -0.0027445   0.4393032   0.3605423   0.3605319
  121   72 iasi_metop-c      10147     42      1.220  -0.0569219  -0.0384008   0.4757747   0.4032983   0.4014660
  122   74 iasi_metop-c      10173     16      0.780   0.1424242  -0.0189679   0.3862381   0.3449180   0.3443960
  123   76 iasi_metop-c      10183      1     -0.640   0.2130095  -0.0143614   0.2760107   0.3363319   0.3360251

The first outer loop IASI and CrIS statistics for all channels are identical.

RussTreadon-NOAA commented 1 year ago

@emilyhcliu , I removed my self as an assignee for this issue. I alerted you to this problem because it looks odd. I do not know which output or which code is correct.

RussTreadon-NOAA commented 1 year ago

@emilyhcliu , please do not assign this issue to me. I cannot resolve the reported problem.

emilyhcliu commented 1 year ago

@emilyhcliu , please do not assign this issue to me. I cannot resolve the reported problem.

Got it

emilyhcliu commented 1 year ago

@ADCollard @jderber-NOAA I am not sure if the changes we spotted in the passive channels are intended. I think the results from the operational run are correct.
Do you have any suggestions?

ADCollard commented 1 year ago

@emilyhcliu. I am still working through this logic, but I have a quick question. When you make the change to gfsda.v16_merge, I see that the passive channel counts are improved (and I agree the operational values are correct). Do you see any change to the minimisation (i.e, are the analysis departures affected)? Apart from the CrIS passive channels we expect the analysis departures to be identical, correct?

emilyhcliu commented 1 year ago

@ADCollard The anlaysis will be different. This is because the develop also includes changes of replacing CRTM virtual temperature jacobian with the sensible temperature jacobian, and some minor bug fixes that will change the results. These changes are documented in PR 443 and Issue 375 .

For comparison of results from the release/gfsda.v16 and the merged branch, we expect that the first outer loop OmF statistics for radiance data should be identical since I am unaware of any changes from the develop that would alter the first outer loop.

I did the experiment on Dogwood. It was taken by NCO this morning for production. I will repeat the test again on Cactus. I did check the OmF for the 2nd and the 3rd loop. The changes are very small for active channels.

ADCollard commented 1 year ago

@emilyhcliu, I was asking about the difference between gfsda.v16_merge and the modified version where you just changed the call to corr_adjust_jacobian, are the departures after the first outer loop identical apart from the passive channels?

emilyhcliu commented 1 year ago

@ADCollard I think so. I can not double-check since the results are on Dogwood (now production) The running directories are still there on ptmp They should be in the following location on Dogwood:

The single-cycle run that reproduced the identical results to operational run: /lfs/h2/emc/ptmp/emily.liu/tmp766/ops-2.2023012800

The single-cycle run testing the merged branch: /lfs/h2/emc/ptmp/emily.liu/tmp766/gsi-merge-2.2023012800

The single-ccycle tun testing the merged branch with modifications to correlated obs part /lfs/h2/emc/ptmp/emily.liu/tmp766/gsi-merge-2-test.2023012800

ADCollard commented 1 year ago

Thanks, @emilyhcliu, I took a look at those files and confirmed that only the passive channels changed after the first outer loop.

jderber-NOAA commented 1 year ago

My question is "How do we want the passive channels to work when we have correlated error?" Seems to me if we include them in the correlated error in any way, this will change the results. I would think that we can only include them with a diagonal error.

ADCollard commented 1 year ago

@jderber-NOAA I think we do not want to try to allow for correlated error for the passive channels. As I understand it, it is only through the updating of the bias correction and (possibly) cloud detector where these channels would need correlated error and that should remain diagonal. Agree?

jderber-NOAA commented 1 year ago

Andrew,

Agreed.

jderber-NOAA commented 1 year ago

After looking at this for a while, I am thinking that including varinv rather than varinv0 in the call to corr_adjust_jacobian would result in the use of the correlated error for the passive channels. This is because corr_adjust_jacobian keys on that array to determine which channels are used for the correlated error. However, I think the other change "wgtjo=varinv" should fix the issue. It should result in varinv being used for the monitored channels and the monitored channels being used as if there was no correlated error.

I note that this would result in the use of the obs errors in the namelist file rather than the diagonal of the error variance matrix. To use these values the code would have to replace the wgtjo values with the diagonal matrix values one would have to modify the values in a way similar to that used in scale_jac for ErrorCov%method== 0. I am not sure that if one did this there would be any change in the result.

Could you please check the change suggested above to see the simple change of "wgtjo=varinv" fixes things?

emilyhcliu commented 1 year ago

@jderber-NOAA I modified the code section you suggested above: (1) still pass varinv0 to the correlated obs cal (2) but set wgtjo = varinv

Please see below: Does it look right to you?
If so, I will test it.

       diagadd=zero
        account_for_corr_obs = .false.
        iii=0
        varinv0=zero
        do ii=1,nchanl
           m=ich(ii)
           if (varinv(ii)>tiny_r_kind .and. iuse_rad(m)>=1) then
             iii=iii+1
             varinv0(ii)=varinv(ii)
             raterr2(ii)=error0(ii)**2*varinv0(ii)
             if (l_may_be_passive .and. .not. retrieval) then
               if(optconv > zero .and. (iasi .or. cris) .and. iinstr /= -1)then
                 asum=zero
                 do k=1,nsig
                   asum=asum+abs(jacobian(iqs+k,ii))*qs(k)
                 end do
                 diagadd(ii)=optconv*asum
               end if
             end if
           end if
        enddo
        err2 = one/error0**2
        tbc0=tbc
        tb_obs0=tb_obs
!       wgtjo=varinv0  !orig
        wgtjo=varinv   !emily_test
        if (l_may_be_passive .and. .not. retrieval) then
          if(iii>0 .and. iinstr.ne.-1)then
            chan_count=(iii*(iii+1))/2
            if(allocated(rsqrtinv))deallocate(rsqrtinv)
            if(allocated(rinvdiag))deallocate(rinvdiag)
            allocate(rsqrtinv(chan_count))
            allocate(rinvdiag(iii))
            rsqrtinv=zero
            rinvdiag=zero
            account_for_corr_obs = corr_adjust_jacobian(iinstr,nchanl,nsigradjac,ich,varinv0,diagadd,&    !orig
       !    account_for_corr_obs = corr_adjust_jacobian(iinstr,nchanl,nsigradjac,ich,varinv,diagadd,&     !emily_test
                                    tbc,tb_obs,err2,raterr2,wgtjo,jacobian,cor_opt,iii,rsqrtinv,rinvdiag)
            varinv=wgtjo
          endif
        endif

RussTreadon-NOAA commented 1 year ago

@emilyhcliu , your change looks right. I made the following change in setuprad.f90

         err2 = one/error0**2
         tbc0=tbc
         tb_obs0=tb_obs
-        wgtjo=varinv0
+        wgtjo=varinv
         if (l_may_be_passive .and. .not. retrieval) then
           if(iii>0 .and. iinstr.ne.-1)then
             chan_count=(iii*(iii+1))/2

Recompiling develop with this change along with crtm/2.4.0 and running a gdas case yields a fort.207 listing with o-g 01 rad channel-by-channel counts the same as operations.

Additionally, running develop with crtm/2.4.0 and operations with correlated error off yields identical channel-by-channel counts for o-g 01.

It would be good to see what you obtain from your test.

ADCollard commented 1 year ago

I'm still very confused about the statistics that are being shown. I am trying to understand why exactly the passive channels are being rejected.

I looked at the flags in the diagnostic files from /lfs/h2/emc/ptmp/emily.liu/tmp766/gsi-merge-2.2023012800 The combined diag file is at /scratch1/NCEPDEV/da/Andrew.Collard/diag_iasi_metop-c_ges.2023012800.nc4.gsi-merge-2

As far as I can tell the QC Flag is set to 0 for almost every case for the high peaking passive channels. I cannot see where the value of ~2000 passing QC in the GSIstat file is coming from.

What am I misunderstanding?

jderber-NOAA commented 1 year ago

Yes Emily I think that is right. I don't think that the o-g 01 being the same is a good measure of whether or not things are right. Of course, that is the first thing that should be right, but need to look at the passive channels to see if the updating of the bias correction is correct. Is it possible to pass the analysis full covariance matrices with the off diagonals equal to zero and the diagonal set to the obs errors in the namelist file? Using that file should produce nearly identical results (might be some round-off differences) to a run with the correlated error turned off. I think.

ADCollard commented 1 year ago

@emilyhcliu @jderber-NOAA I have an IDL script that can modify the RCov files if you want to test out John's idea.

emilyhcliu commented 1 year ago

@jderber-NOAA @ADCollard @RussTreadon-NOAA

Three experiments (20230128 00Z):

ops ---- This is just a sanity check that the single-cycle GSI script can reproduce the operational results
gfsda.v16_merge ---We observed different departure statistics for passive hyperspectral channels
gfsda.v16_merge_test --- This is the test with modification (set wgtjo=varinv) mentioned above

I ran experiments 1 and 2 on Dogwood a few days ago. NCO took Dogwood away yesterday. So, I repeated experiments 1 and 2 on Cactus, and the new one (experiment 3).

The output can be found on Cactus under: /lfs/h2/emc/ptmp/emily.liu/tmp766

The runs completed the analysis without problems. However, They are having trouble writing out the diagnostic files. Got MPICH error. Investigating...... ! Probably need to update modules to use the latest versions.

OK. the following worked: The nc_diag_cat.x is replaced with ncdiag_cat_serial.x for concatenating diag files.

jderber-NOAA commented 1 year ago

Andrew,

I am trying to look at your question from the code perspective. Let me summarize your confusion. Almost all of the high peaking channels have good flags (as expected since they are above clouds), but only 2000 are being used for the passive bc as indicated by the output. Why? Is this correct?

Can I ask for a few more details - what data are you looking at (iasi?), are the obs errors nonzero as well as qcflag == 0? Any indication of clouds for either the ones used or not used (I guess you just have to count to see if either clouds or not clouds is ~2000)? Are you looking at results after Emily has run the changes above?

ADCollard commented 1 year ago

Andrew,

I am trying to look at your question from the code perspective. Let me summarize your confusion. Almost all of the high peaking channels have good flags (as expected since they are above clouds), but only 2000 are being used for the passive bc as indicated by the output. Why? Is this correct?

Can I ask for a few more details - what data are you looking at (iasi?), are the obs errors nonzero as well as qcflag == 0? Any indication of clouds for either the ones used or not used (I guess you just have to count to see if either clouds or not clouds is ~2000)? Are you looking at results after Emily has run the changes above?

@jderber-NOAA I am looking at IASI diagnostic files. In particular, /scratch1/NCEPDEV/da/Andrew.Collard/diag_iasi_metop-c_ges.2023012800.nc4.gsi-merge-2 on Hera. I am seeing almost all obs passing QC for channel 29, but the statistics in the gsistat file say only 1926 pass. Varinv is 0 for all obs for this channel.

So I am confused about the inconsistency between the QC Flags and the counts reported in the gsistat file.

This is obviously for the merge branch before any bugfixes (which solve the problem). I want to be clear on the mechanism for getting the 1926 value in case it is indicitive of some other issue.

jderber-NOAA commented 1 year ago

Yes, the bug fixes should fix the issue with varinv=0 and almost no high peaking channels passing. What is curious is how they got 1926 to print out - especially since all the varinv = 0. My initial look did not find any way this could happen. Will keep looking.

jderber-NOAA commented 1 year ago

I really see no way for the 1926 number of obs to come out unless the stats array is not initialized to zero. I will add three lines to the beginning of setuprhsall.f90 in my version of the code to zero out the stats arrays (one for rad, one for ozone and one for co) to see if it makes any difference in the output.

jderber-NOAA commented 1 year ago

I think the best way to address the issue that Andrew and I are a bit concerned about is to repeat the run that produced the ~2000 numbers for passive high peaking iasi channels and add a print statement when it counts the values for one of the questionable channels. I would print out varinv(i), n, m, wgtjo(i), varinv0(i) and tiny_r_kind.

jderber-NOAA commented 1 year ago

If you have a script on Hera, I would be happy to make the runs and print outs.

emilyhcliu commented 1 year ago

Recap, Follow-up, and Conclusion

Recap

Issue with IASI - the # of data passed QC for passive channels are much reduced (see comment from @RussTreadon-NOAA ).

Source of the problem comes from the changes for correlated observation error calculation in setuprad.f90 (see comment from @emilyhcliu)

Proposed solution is the one-line change as the following:

         err2 = one/error0**2
         tbc0=tbc
         tb_obs0=tb_obs
-        wgtjo=varinv0
+        wgtjo=varinv
         if (l_may_be_passive .and. .not. retrieval) then
           if(iii>0 .and. iinstr.ne.-1)then
             chan_count=(iii*(iii+1))/2

This one-line change should only impacts the results for hyperspectral IR sensors (IASI and CrIS)

Follow-up

Performed three stand-alone experiments for 2023012100 cycle: The output are can be found on Dogwood: /lfs/h2/emc/ptmp/emily.liu/tmp766/

ops (gfsda.v16)

This is a sanity check to verify the GSI single-cycle script used can reproduce operational results
Verification indicates that theops run can reproduce the operational results. So, the script is good for making further test runs.

merge (gfsda.v16 + develop)

It is expected that the first outer loop results for all satellite radiance data should be identical to the ops run
The merge run verifies that the first outer loop statists for satellite radiance data are identical to the ops run except for passive hyperspectral channels.

merge-test (gfsda.v16 + develop + proposed one-line change)

It is expected that the one-line code change will make sure the IASI/CrIS results in the first-outer loop for both active and passive channels are identical to ops results.
Compare the first-outer loop results between ops and merge-test showed identical results.

ops

   99   16 iasi_metop-c      10180     12      1.380  -0.2711876  -0.0529533   0.3718680   0.4215291   0.4181898
  100   29 iasi_metop-c      10189      1     -0.810  -0.0185885  -0.0491907   0.2238049   0.3839516   0.3807875
  101   32 iasi_metop-c       9479      6     -0.750   0.0401685  -0.0214070   0.2826716   0.3987942   0.3982192
  102   35 iasi_metop-c      10190      0     -0.790   0.0156432  -0.0396713   0.2189363   0.3704123   0.3682818
  103   38 iasi_metop-c       9719     22      0.720   0.0924696  -0.0182938   0.3734094   0.3771003   0.3766563
  104   41 iasi_metop-c      10190      0     -0.740   0.0589739  -0.0419500   0.2322487   0.3573792   0.3549086
  105   44 iasi_metop-c       9921      6     -0.680   0.1108667  -0.0145554   0.2790628   0.3592954   0.3590005
  106   47 iasi_metop-c      10188      2     -0.720   0.1178685  -0.0415547   0.2330430   0.3483068   0.3458191
  107   49 iasi_metop-c      10179     13      0.650   0.1554882  -0.0257505   0.3752137   0.3425692   0.3416000
  108   50 iasi_metop-c       9979      2     -0.650   0.1014349  -0.0157072   0.2944851   0.3528285   0.3524787
  109   51 iasi_metop-c       9831     21      0.650   0.1744826  -0.0200698   0.3728664   0.3490133   0.3484358
  110   53 iasi_metop-c      10190      0     -0.690   0.1375607  -0.0481704   0.2386986   0.3377879   0.3343355
  111   55 iasi_metop-c      10175     17      0.640   0.1765295  -0.0273602   0.3742265   0.3336078   0.3324839
  112   56 iasi_metop-c       9983      5     -0.640   0.0963870  -0.0174977   0.2881476   0.3436569   0.3432112
  113   57 iasi_metop-c       9795     12      0.650   0.1613916  -0.0227391   0.3703186   0.3431522   0.3423979
  114   59 iasi_metop-c      10177     15      0.670   0.1854652  -0.0414190   0.3978062   0.3303865   0.3277800
  115   61 iasi_metop-c      10171     21      0.620   0.2202000  -0.0315641   0.3678953   0.3270270   0.3255002

merge

   99   16 iasi_metop-c      10180     12      1.380  -0.2711876  -0.0529533   0.3718680   0.4215291   0.4181898
  100   29 iasi_metop-c      10189      1     -0.810  -0.0185885  -0.0491907   0.2238049   0.3839516   0.3807875
  101   32 iasi_metop-c       9479      6     -0.750   0.0401685  -0.0214070   0.2826716   0.3987942   0.3982192
  102   35 iasi_metop-c      10190      0     -0.790   0.0156432  -0.0396713   0.2189363   0.3704123   0.3682818
  103   38 iasi_metop-c       9719     22      0.720   0.0924696  -0.0182938   0.3734094   0.3771003   0.3766563
  104   41 iasi_metop-c      10190      0     -0.740   0.0589739  -0.0419500   0.2322487   0.3573792   0.3549086
  105   44 iasi_metop-c       9921      6     -0.680   0.1108667  -0.0145554   0.2790628   0.3592954   0.3590005
  106   47 iasi_metop-c      10188      2     -0.720   0.1178685  -0.0415547   0.2330430   0.3483068   0.3458191
  107   49 iasi_metop-c      10179     13      0.650   0.1554882  -0.0257505   0.3752137   0.3425692   0.3416000
  108   50 iasi_metop-c       9979      2     -0.650   0.1014349  -0.0157072   0.2944851   0.3528285   0.3524787
  109   51 iasi_metop-c       9831     21      0.650   0.1744826  -0.0200698   0.3728664   0.3490133   0.3484358
  110   53 iasi_metop-c      10190      0     -0.690   0.1375607  -0.0481704   0.2386986   0.3377879   0.3343355
  111   55 iasi_metop-c      10175     17      0.640   0.1765295  -0.0273602   0.3742265   0.3336078   0.3324839
  112   56 iasi_metop-c       9983      5     -0.640   0.0963870  -0.0174977   0.2881476   0.3436569   0.3432112
  113   57 iasi_metop-c       9795     12      0.650   0.1613916  -0.0227391   0.3703186   0.3431522   0.3423979
  114   59 iasi_metop-c      10177     15      0.670   0.1854652  -0.0414190   0.3978062   0.3303865   0.3277800

We also expect that the one-line code change will not impact analysis results between merge and merge-test.
Comparison the gnorm values for all iterations between the merge and merge-test indicates that the analysis is identical
Here is the J term for the first and second outer loops.

merge - 1 st outer-loop

      J term                                     J  
 excess   moisture            3.3617174390229632E-01
 surface pressure             1.1684761408237271E+04
 temperature                  8.7778766100908251E+04
 wind                         2.2279476994370273E+05
 moisture                     8.5225212424066685E+03
 sst                          3.6552295319311052E+03
 ozone                        2.0066137384165482E+04
 gps bending angle            3.4374390194896946E+05
 radiance                     1.1654059537320442E+06
 tcp (tropic cyclone)         5.1955232132715246E-01

merge - 2nd outer-loop

      J term                                     J  
 background                   8.4539991157273762E+04
 excess   moisture            4.1792812809399271E-01
 surface pressure             9.8791469421266429E+03
 temperature                  5.9048973517868413E+04
 wind                         1.5874273505525949E+05
 moisture                     5.7353937858645750E+03
 sst                          1.5587063605920634E+03
 ozone                        1.2523480289180294E+04
 gps bending angle            2.0457544809380785E+05
 radiance                     9.1181695034682914E+05
 tcp (tropic cyclone)         3.2765719758239818E-03

merge-test - 1st outer loop

     J term                                     J
excess   moisture            3.3617174390229632E-01
surface pressure             1.1684761408237271E+04
temperature                  8.7778766100908251E+04
wind                         2.2279476994370273E+05
moisture                     8.5225212424066685E+03
sst                          3.6552295319311052E+03
ozone                        2.0066137384165482E+04
gps bending angle            3.4374390194896946E+05
radiance                     1.1654059537320442E+06
tcp (tropic cyclone)         5.1955232132715246E-01

merge-test - 2nd outer-loop

     J term                                     J
background                   8.4539991157273762E+04
excess   moisture            4.1792812809399271E-01
surface pressure             9.8791469421266429E+03
temperature                  5.9048973517868413E+04
wind                         1.5874273505525949E+05
moisture                     5.7353937858645750E+03
sst                          1.5587063605920634E+03
ozone                        1.2523480289180294E+04
gps bending angle            2.0457544809380785E+05
radiance                     9.1181695034682914E+05
tcp (tropic cyclone)         3.2765719758239818E-03

Conclusion:

The one-line code change worked as expected:

Fix the QC for passive channels
Does not impact the analysis result

emilyhcliu commented 1 year ago

Working on updated regression tests --- adding ta2tb=.true. and cao_check=.true. to the namelist

NOAA-EMC / GSI