Open XuanliLi-NOAA opened 10 months ago
GSI run obs vs. hofx plot before QC:
GSI run obs vs. hofx plot after QC:
JEDI run obs vs. hofx plot before QC:
JEDI run obs vs. hofx plot after applying pccf and qfro:
JEDI run obs vs. hofx plot after all QCs:
GSI run hofx vs. JEDI run hofx before QCs:
GSI run hofx vs. JEDI run hofx after QCs:
There is an apparent cutoff near obs = 0.03 rad in the JEDI run. Additionally, 74060 more data are being filtered in the JEDI run (320029) compared to GSI (394089).
Inconsistent impact parameter was found between GSI and JEDI. When this variable was obtained from the GPSRO bufr file during the bufr query, it has been rounded into either 0.0 or 0.5. This works fine for most satellites but not for the GFZ satellites (Tandem, TerraSar, and GRACE FO). For 2021080100, there are 2 profiles from TerraSar. Below are some the sample data:
A few changes in bufr2ioda_gnssro.py:
obs vs. JEDI hofx before application of pccf/qfro check:
obs vs. JEDI hofx after application of pccf/qfro check overplotted by GSI hofx before other QCs:
There are 5613 (out of 502758) more data being retained by JEDI than GSI. Some of the large obs bending angle (> 0.042 Rad) are rejected because they are below model level 3 (QC=15), while GSI retained these data.
obs vs. JEDI/GSI hofx after QC: There are 4818 out of 394089 (1.2%) more data being retained by JEDI than GSI.
hofx JEDI vs. hofx GSI before QC: There are some data with large difference in hofx.
hofx JEDI vs. hofx GSI after QC: Most of the data with really large hofx difference are filtered by QC.
Percent of hofx difference between JEDI and GSI vs. hofx GSI after QC: There are 720 out of 392874 (0.18%) having hofx difference > 2%. I'm looking at the T, Q, and Hgt for the difference.
Kompsat-5 (SAID=825): 4 out of 2890 more data are retained by JEDI than GSI.
GeoOptics(SAID=265): 30 out of 32745 (0.1%) less data are retained by JEDI than GSI. Most of them are due to cutoff check
COSMIC-2: 341 out of 247345 (0.12%) more data are retained by JEDI than GSI due to cutoff check, ddnj, and SR1.
MetOp: The differences in QC data count are 3, 13, and 3, for MetOp-B, MetOp-A, and MetOp-C respectively. These are due to metop-8km check, ddnj, and SR1.
PAZ: 5 out of 5810 more data are retained by JEDI than GSI.
TerraSar: 1 out of 416 less data are retained by JEDI than GSI.
Effective Error for data passed QC:
Geopotential Height JEDI vs. GSI
Large differences in Geopotential Height occur for data with GSI_qc = 1, or 7 and JEDI_qc = 12, 13, 15, or 19. When these QCs are applied, the data was cycled and the geopotential height was not computed correctly.
The figure below shows geopotential height difference for data with large Hofx difference (>2%) and QC = 0:
Temperature
The same thing happens to temperature field. The figure below is temperature difference for data with large HofX difference (> 2%) and QC = 0 (722/392874):
Specific humidity:
The figure below is specific humidity difference for data with large HofX difference (> 2%) and QC = 0 (722/392874):
Location of obs with geopotential height difference larger than 200 m Location of obs with temperature difference greater than 0.2 deg
@XuanliLi-NOAA @ADCollard @emilyhcliu I've done a bit of digging around with some results below. Though I'm not sure I've found anything especially conclusive but perhaps after some discussion we can plan other things to look at. In order to investigate I added some code to UFO to read the GSI GeoVaLs when running h(x) from fv3-jedi. Basically the code lets you pass both fv3-jedi and gsi GeoVaLs to the operator at the same time and then you can choose which system provides specific variables. I conducted several experiments, altering which variables come from which system:
exp_a:
gsi: [t, q, p, gph, sgph]
jedi: []
exp_b:
gsi: [q, p, gph, sgph]
jedi: [t]
exp_c:
gsi: [q, gph, sgph]
jedi: [t, p]
exp_d:
gsi: [q, gph]
jedi: [t, p, sgph]
exp_e:
gsi: [q]
jedi: [t, p, gph, sgph]
exp_f:
gsi: []
jedi: [t, q, p, gph, sgph]
exp_g:
gsi: [t, p, gph, sgph]
jedi: [q]
exp_h:
gsi: [t, q, p]
jedi: [gph, sgph]
The following plots show hofx from JEDI vs h(x) from GSI. Blue points are all data and red are passing JEDI QC. In each the left plot shows the differnece (as a percentage) vs the GSI values. The right shows the same metric against impact height, all the problems seem to occur for lowest impact heights (<10km).
exp_a, all GeoVaLs from GSI (sanity check on the code):
exp_f, all GeoVaLs from JEDI:
exp_h, height variable from JEDI:
Differences in height GeoVaLS don't seem to be a big contributor.
Experiment g is interesting. Where only specific humidity comes from JEDI. That recovers what looks like all the problems of having everything come from JEDI.
When I plot specific humidity GeoVaLs (GSI) vs GeoVaLs (JEDI) at all levels there is quite a bit of spread:
So perhaps this is the closest thing to a smoking gun to look into. We could look through the pathways by which that variable is derived and sent to the operator. Perhaps there are differences between JEDI and GSI. In JEDI there is no variable change on humidity, it's from restart to operator. But perhaps in GSI it is derived?
I suppose specific humidity is a variable with small scales and perhaps we have to accept this degree of difference in the GeoVaLs as the interpolation returns different things, neither right or wrong. If the bending angle calculation has a high degree of sensitivity to humidity then we cannot expect similar results to GSI. Have we looked at aircraft humidities yet?
@danholdaway Thank you so much for conducting the experiments and looking into the differences. Have you done an experiment with only temperature from JEDI? I'm on travel this week to the ROMEX workshop, I'll have more time in next week.
@danholdaway Thank you so much for conducting the experiments and looking into the differences. Have you done an experiment with only temperature from JEDI? I'm on travel this week to the ROMEX workshop, I'll have more time in next week.
Yes, that was experiment 'b':
If you want some other combination of variable let me know and I can run it.
I should also add 'e', the inverse of 'g' where everything comes from JEDI, except specific humidity, which comes from GSI:
@danholdaway Thank you! I agree with you, it appears that specific humidity is causing a much larger difference than other variables.
This is an example profile for radiosondes. This one has a 100% error at ~400hPa. Can this be an interpolation artifact?
This sonde was launched from 36.83N, 10.23E on 2021080100.
Picking up on @danholdaway ' s comment
We could look through the pathways by which that variable is derived and sent to the operator. Perhaps there are differences between JEDI and GSI. In JEDI there is no variable change on humidity, it's from restart to operator. But perhaps in GSI it is derived?
specific humidity and water vapor mixing ratio are similar but different. Could a conversion be occurring somewhere? Could code be assuming one representation when, in fact, the other representation is used?
@ADCollard a bit more digging...
With the help of @DavidNew-NOAA I've compared cubed sphere restarts, cubed sphere history and Gaussian history. There is essentially no difference between the specific humidity in the cube sphere restart and history. The below shows layer 100, tile 2 of the cube. Differences are no more than ~1.0e-4%.
Then I tried interpolating David's cube sphere output to Gaussian using the JEDI interpolation (not the interpolation used to get GeoVaLs but something decent from Atlas). This might show us if there is a high degree of sensitivity to the interpolation for sphum. The plots show the difference (%) in temperature and specific humidity at level 100.
I guess there isn't too much surprise here. You see some grid imprinting because of small differences in lat/lon probably. The differences are bigger for specific humidity, presumably because of the sharper horizontal gradients. You might argue that differences of 0.3% are quite large but the character of the differences is quite reasonable and aligned with terrain and cube artifacts for the most part. In this case I think you just have to say "it is what it is".
With the help of @emilyhcliu the other week I tracked down the Gaussian history that was used by GSI for the original runs that go into @XuanliLi-NOAA's evaluation. Then I did the same thing and interpolated the CS restarts to Gaussian. This time however the differences are much larger than we see for the case David just made. And especially the differences for specific humidity are much larger at ~50%. David's case is C96 and Xuanli's case is C768.
Here's that specific humidity plot again but with the colorbars limited to 10%.
What I think stands out is that the character of the differences is altered and the grid imprinting is gone. The differences are still somewhat aligned with terrain but there is a clear bias with most of the difference being red (gauss > cube). So perhaps at the time these cycles were made that went into the UFO validation there was some issue or inconsistency between restarts and histories?
If we want to delve further into this then perhaps the next thing to try is to make a new cycle where we can first check that the cubes are consistent with the Gaussians? I don't think we can infer too much about h(x) differences when there is this degree of difference in the background that each system receives. It seems that it's possible to have backgrounds closer to each other passed to the respective systems, as with the run David made.
@RussTreadon-NOAA could be piggy back off of one of your experiments perhaps? I think we'd just need to turn on the ncdiags from GSI but we could discuss next week.
Based on the percent differences between the david
and xuanli
runs, the larger difference (Cubic Sphere vs. Gaussian) is not limited to moisture. The percent difference for temperature is not small.
Generate ioda files for each of the GNSS RO mission. Here is COSMIC-2: All obs vs. hofx: Obs after pccf/qfro flag vs. hofx: Obs after QC filters vs. hofx: Comparison of hofx from GSI vs. JEDI: Comparison of hofx from GSI vs. JEDI after QC filters: QC filters: 294/274034 more data have been filtered by JEDI than GSI, mainly due to cutoff.
@XuanliLi-NOAA is this made using the new 2024-02-19 12z run?
No, this is still the old date 2021-08-01. I'm trying to make sure things run smoothly with yamls and bufr2ioda for separate satellite, and I'll do the same for the new date when the files are available.
GNSSRO API updates:
Updated the pccf flag to use the profile pccf for CDAAC RO data Updated the qfro flag to use the profile qfro for EUMETSAT RO data Updated the satelliteAscending flag: convert the 3rd bit of qfro to satelliteAscending flag Updated some information in the json file Added QC filters in config gnssro yaml.