NOAA-EMC / GDASApp

Global Data Assimilation System Application
GNU Lesser General Public License v2.1
15 stars 31 forks source link

[End-to-End Test Code Sprint] Update GNSSRO converter and YAML #750

Open XuanliLi-NOAA opened 10 months ago

XuanliLi-NOAA commented 10 months ago

GNSSRO API updates:

Updated the pccf flag to use the profile pccf for CDAAC RO data Updated the qfro flag to use the profile qfro for EUMETSAT RO data Updated the satelliteAscending flag: convert the 3rd bit of qfro to satelliteAscending flag Updated some information in the json file Added QC filters in config gnssro yaml.

XuanliLi-NOAA commented 10 months ago

GSI run obs vs. hofx plot before QC: scatter_obs_hofx_gsi

GSI run obs vs. hofx plot after QC: scatter_obs_hofx_gsi_qc

XuanliLi-NOAA commented 10 months ago

JEDI run obs vs. hofx plot before QC: scatter_obs_hofx0_fv3

JEDI run obs vs. hofx plot after applying pccf and qfro: scatter_obs_hofx0_fv3_pc

JEDI run obs vs. hofx plot after all QCs: scatter_obs_hofx0_fv3_qc

XuanliLi-NOAA commented 10 months ago

GSI run hofx vs. JEDI run hofx before QCs: scatter_hofxgsi_hofx0fv3

GSI run hofx vs. JEDI run hofx after QCs: scatter_hofxgsi_hofx0fv3_qc

There is an apparent cutoff near obs = 0.03 rad in the JEDI run. Additionally, 74060 more data are being filtered in the JEDI run (320029) compared to GSI (394089).

XuanliLi-NOAA commented 10 months ago

Inconsistent impact parameter was found between GSI and JEDI. When this variable was obtained from the GPSRO bufr file during the bufr query, it has been rounded into either 0.0 or 0.5. This works fine for most satellites but not for the GFZ satellites (Tandem, TerraSar, and GRACE FO). For 2021080100, there are 2 profiles from TerraSar. Below are some the sample data: image

XuanliLi-NOAA commented 8 months ago

A few changes in bufr2ioda_gnssro.py:

  1. Added sequenceNumber and added sorting of impactHeight according to sequenceNumber
  2. Using pccf and qfro flags in the header information of each bufr message in stead of the one in ROSEQ2 and add it in the yaml file
  3. Added satelliteAscendingFlag

obs vs. JEDI hofx before application of pccf/qfro check: scatter_obs_hofx0_fv3

obs vs. JEDI hofx after application of pccf/qfro check overplotted by GSI hofx before other QCs: scatter_obs_hofx_gsi_fv3

There are 5613 (out of 502758) more data being retained by JEDI than GSI. Some of the large obs bending angle (> 0.042 Rad) are rejected because they are below model level 3 (QC=15), while GSI retained these data.

obs vs. JEDI/GSI hofx after QC: scatter_obs_hofx_gsi_fv3_qc There are 4818 out of 394089 (1.2%) more data being retained by JEDI than GSI.

hofx JEDI vs. hofx GSI before QC: scatter_hofxgsi_hofx0fv3 There are some data with large difference in hofx.

hofx JEDI vs. hofx GSI after QC: scatter_hofxgsi_hofx0fv3_qc Most of the data with really large hofx difference are filtered by QC.

Percent of hofx difference between JEDI and GSI vs. hofx GSI after QC: scatter_hofxgsi_hofx0fv3_qc_diff There are 720 out of 392874 (0.18%) having hofx difference > 2%. I'm looking at the T, Q, and Hgt for the difference.

XuanliLi-NOAA commented 8 months ago

of count for different QC checks:

Kompsat-5 (SAID=825): bar_qc_825 4 out of 2890 more data are retained by JEDI than GSI.

GeoOptics(SAID=265): bar_qc_265 30 out of 32745 (0.1%) less data are retained by JEDI than GSI. Most of them are due to cutoff check

COSMIC-2: bar_qc_750 bar_qc_751 bar_qc_752 bar_qc_753 bar_qc_754 bar_qc_755 341 out of 247345 (0.12%) more data are retained by JEDI than GSI due to cutoff check, ddnj, and SR1.

MetOp: bar_qc_3 bar_qc_4 bar_qc_5 The differences in QC data count are 3, 13, and 3, for MetOp-B, MetOp-A, and MetOp-C respectively. These are due to metop-8km check, ddnj, and SR1.

PAZ: bar_qc_44 5 out of 5810 more data are retained by JEDI than GSI.

TerraSar: bar_qc_42 1 out of 416 less data are retained by JEDI than GSI.

XuanliLi-NOAA commented 8 months ago

Effective Error for data passed QC: scatter_obserr_final

XuanliLi-NOAA commented 7 months ago

Geopotential Height JEDI vs. GSI

scatter_gsi_fv3_gph

Large differences in Geopotential Height occur for data with GSI_qc = 1, or 7 and JEDI_qc = 12, 13, 15, or 19. When these QCs are applied, the data was cycled and the geopotential height was not computed correctly.

The figure below shows geopotential height difference for data with large Hofx difference (>2%) and QC = 0:

scatter_gsi_fv3_diffhgt_large_2

XuanliLi-NOAA commented 7 months ago

Temperature

scatter_gsi_fv3_temp

The same thing happens to temperature field. The figure below is temperature difference for data with large HofX difference (> 2%) and QC = 0 (722/392874): scatter_gsi_fv3_difftemp_large_2

XuanliLi-NOAA commented 7 months ago

Specific humidity: scatter_gsi_fv3_sh

The figure below is specific humidity difference for data with large HofX difference (> 2%) and QC = 0 (722/392874): scatter_gsi_fv3_diffsh_large_2

XuanliLi-NOAA commented 7 months ago

Location of obs with geopotential height difference larger than 200 m scatter_gph_loc Location of obs with temperature difference greater than 0.2 deg scatter_temp_loc

danholdaway commented 5 months ago

@XuanliLi-NOAA @ADCollard @emilyhcliu I've done a bit of digging around with some results below. Though I'm not sure I've found anything especially conclusive but perhaps after some discussion we can plan other things to look at. In order to investigate I added some code to UFO to read the GSI GeoVaLs when running h(x) from fv3-jedi. Basically the code lets you pass both fv3-jedi and gsi GeoVaLs to the operator at the same time and then you can choose which system provides specific variables. I conducted several experiments, altering which variables come from which system:

exp_a:
  gsi:  [t, q, p, gph, sgph]
  jedi: []

exp_b:
  gsi:  [q, p, gph, sgph]
  jedi: [t]

exp_c:
  gsi:  [q, gph, sgph]
  jedi: [t, p]

exp_d:
  gsi:  [q, gph]
  jedi: [t, p, sgph]

exp_e:
  gsi:  [q]
  jedi: [t, p, gph, sgph]

exp_f:
  gsi:  []
  jedi: [t, q, p, gph, sgph]

exp_g:
  gsi:  [t, p, gph, sgph]
  jedi: [q]

exp_h:
  gsi:  [t, q, p]
  jedi: [gph, sgph]

The following plots show hofx from JEDI vs h(x) from GSI. Blue points are all data and red are passing JEDI QC. In each the left plot shows the differnece (as a percentage) vs the GSI values. The right shows the same metric against impact height, all the problems seem to occur for lowest impact heights (<10km).

exp_a, all GeoVaLs from GSI (sanity check on the code):

jedi_vs_gsi_exp_a

exp_f, all GeoVaLs from JEDI:

jedi_vs_gsi_exp_f

exp_h, height variable from JEDI:

jedi_vs_gsi_exp_h

Differences in height GeoVaLS don't seem to be a big contributor.

Experiment g is interesting. Where only specific humidity comes from JEDI. That recovers what looks like all the problems of having everything come from JEDI.

jedi_vs_gsi_exp_g

When I plot specific humidity GeoVaLs (GSI) vs GeoVaLs (JEDI) at all levels there is quite a bit of spread:

Screenshot 2024-04-15 at 11 05 00

So perhaps this is the closest thing to a smoking gun to look into. We could look through the pathways by which that variable is derived and sent to the operator. Perhaps there are differences between JEDI and GSI. In JEDI there is no variable change on humidity, it's from restart to operator. But perhaps in GSI it is derived?

danholdaway commented 5 months ago

I suppose specific humidity is a variable with small scales and perhaps we have to accept this degree of difference in the GeoVaLs as the interpolation returns different things, neither right or wrong. If the bending angle calculation has a high degree of sensitivity to humidity then we cannot expect similar results to GSI. Have we looked at aircraft humidities yet?

XuanliLi-NOAA commented 5 months ago

@danholdaway Thank you so much for conducting the experiments and looking into the differences. Have you done an experiment with only temperature from JEDI? I'm on travel this week to the ROMEX workshop, I'll have more time in next week.

danholdaway commented 5 months ago

@danholdaway Thank you so much for conducting the experiments and looking into the differences. Have you done an experiment with only temperature from JEDI? I'm on travel this week to the ROMEX workshop, I'll have more time in next week.

Yes, that was experiment 'b':

jedi_vs_gsi_exp_b

danholdaway commented 5 months ago

If you want some other combination of variable let me know and I can run it.

danholdaway commented 5 months ago

I should also add 'e', the inverse of 'g' where everything comes from JEDI, except specific humidity, which comes from GSI:

jedi_vs_gsi_exp_e

XuanliLi-NOAA commented 5 months ago

@danholdaway Thank you! I agree with you, it appears that specific humidity is causing a much larger difference than other variables.

ADCollard commented 5 months ago

This is an example profile for radiosondes. This one has a 100% error at ~400hPa. Can this be an interpolation artifact?

Screenshot 2024-04-25 at 12 00 31

This sonde was launched from 36.83N, 10.23E on 2021080100.

RussTreadon-NOAA commented 5 months ago

Picking up on @danholdaway ' s comment

We could look through the pathways by which that variable is derived and sent to the operator. Perhaps there are differences between JEDI and GSI. In JEDI there is no variable change on humidity, it's from restart to operator. But perhaps in GSI it is derived?

specific humidity and water vapor mixing ratio are similar but different. Could a conversion be occurring somewhere? Could code be assuming one representation when, in fact, the other representation is used?

danholdaway commented 5 months ago

@ADCollard a bit more digging...

With the help of @DavidNew-NOAA I've compared cubed sphere restarts, cubed sphere history and Gaussian history. There is essentially no difference between the specific humidity in the cube sphere restart and history. The below shows layer 100, tile 2 of the cube. Differences are no more than ~1.0e-4%.

hst_vs_rst_q_tile2_level100

Then I tried interpolating David's cube sphere output to Gaussian using the JEDI interpolation (not the interpolation used to get GeoVaLs but something decent from Atlas). This might show us if there is a high degree of sensitivity to the interpolation for sphum. The plots show the difference (%) in temperature and specific humidity at level 100.

gauss_difference_david_t_100

gauss_difference_david_q_100

I guess there isn't too much surprise here. You see some grid imprinting because of small differences in lat/lon probably. The differences are bigger for specific humidity, presumably because of the sharper horizontal gradients. You might argue that differences of 0.3% are quite large but the character of the differences is quite reasonable and aligned with terrain and cube artifacts for the most part. In this case I think you just have to say "it is what it is".

With the help of @emilyhcliu the other week I tracked down the Gaussian history that was used by GSI for the original runs that go into @XuanliLi-NOAA's evaluation. Then I did the same thing and interpolated the CS restarts to Gaussian. This time however the differences are much larger than we see for the case David just made. And especially the differences for specific humidity are much larger at ~50%. David's case is C96 and Xuanli's case is C768.

gauss_difference_xuanli_t_100

gauss_difference_xuanli_q_100

Here's that specific humidity plot again but with the colorbars limited to 10%.

gauss_difference_xuanli_q_100

What I think stands out is that the character of the differences is altered and the grid imprinting is gone. The differences are still somewhat aligned with terrain but there is a clear bias with most of the difference being red (gauss > cube). So perhaps at the time these cycles were made that went into the UFO validation there was some issue or inconsistency between restarts and histories?

If we want to delve further into this then perhaps the next thing to try is to make a new cycle where we can first check that the cubes are consistent with the Gaussians? I don't think we can infer too much about h(x) differences when there is this degree of difference in the background that each system receives. It seems that it's possible to have backgrounds closer to each other passed to the respective systems, as with the run David made.

@RussTreadon-NOAA could be piggy back off of one of your experiments perhaps? I think we'd just need to turn on the ncdiags from GSI but we could discuss next week.

emilyhcliu commented 5 months ago

Based on the percent differences between the david and xuanli runs, the larger difference (Cubic Sphere vs. Gaussian) is not limited to moisture. The percent difference for temperature is not small.

XuanliLi-NOAA commented 4 months ago

Generate ioda files for each of the GNSS RO mission. Here is COSMIC-2: All obs vs. hofx: scatter_obs_hofx_gsi_fv3 Obs after pccf/qfro flag vs. hofx: scatter_obs_hofx_gsi_fv3_pc Obs after QC filters vs. hofx: scatter_obs_hofx_gsi_fv3_qc Comparison of hofx from GSI vs. JEDI: scatter_hofxgsi_hofx0fv3 Comparison of hofx from GSI vs. JEDI after QC filters: scatter_hofxgsi_hofx0fv3_qc QC filters: 294/274034 more data have been filtered by JEDI than GSI, mainly due to cutoff. bar_qc_cosmic2

danholdaway commented 4 months ago

@XuanliLi-NOAA is this made using the new 2024-02-19 12z run?

XuanliLi-NOAA commented 4 months ago

No, this is still the old date 2021-08-01. I'm trying to make sure things run smoothly with yamls and bufr2ioda for separate satellite, and I'll do the same for the new date when the files are available.