NOAA-EMC / HDASApp

GNU General Public License v3.0
0 stars 3 forks source link

Test the UFO of SFCSHP with HAFS #6

Open JingCheng-NOAA opened 5 months ago

JingCheng-NOAA commented 5 months ago

To record the process of assimilating SFCSHP using JEDI 3DEnVar using HAFS background

JingCheng-NOAA commented 5 months ago

The file contains observations: airTemperature, seaSurfaceTemperature, specificHumidity, stationElevation, stationPressure, virtualTemperature, windEastward, windNorthward.

SFCSHP_obs_full

JingCheng-NOAA commented 5 months ago

When cropped the data to fit within the model domain, the datatype of 'stationIdentification' @ MetaData group tend out to be 'O'. Not sure if this the correct, because when use 'ncdump' to check the IODA file, it listed as type of 'string'. So manually convert this variable from 'object' into 'string' and write out as NC file. Though the available observation number reduced about 1/3, but the file size increased 3 times due to this variable type change. SFCSHP_obs_cropped

JingCheng-NOAA commented 5 months ago

First attempt run with the cropped data encountered an error that looking for 'MetaData/stationElevation'. This should be something wrong with JEDI code since the observation NC file does not contain 'MetaData/stationElevation'. 'stationElevation' exists in group 'obsType' and 'obsValue'.

JingCheng-NOAA commented 5 months ago

Encountered error in analysis by complaining missing variable "MetaData/stationElevation". The current bufr2ioday.py creates "MetaData/heightOfStation". The python script is modified to change this variable name to "MetaData/stationElevation" so JEDI can recognize the variable. The analysis manage to run through after the change.

JingCheng-NOAA commented 5 months ago

Three variables are simulated [stationPressure, airTemperature, specificHumidity] obs_airTemperature hofx0_airTemperature obsomb_airTemperature

JingCheng-NOAA commented 5 months ago

obs_specificHumidity hofx0_specificHumidity obsomb_specificHumidity

JingCheng-NOAA commented 5 months ago

obs_stationPressure hofx0_stationPressure obsomb_stationPressure

JingCheng-NOAA commented 5 months ago

Though hofx0 and OMBG looks reasonable, the increment is all 0 and it seems like all the observations are marked as missing Value according to the log file. However, the latest test removing all the observation filter still have the same result. None of the observations are used in the analysis.

JingCheng-NOAA commented 5 months ago

The 0 increment is due to lack of observation error information. Use gdas issue#82 as a reference to modify the YAML file by adding in and inflating the observation errors.

JingCheng-NOAA commented 5 months ago

Start a single observation experiment to test on the observation error for the temperature (obstype: 180) SFCSHP_singleobs_on_domain

Assign initial error of 2.5 K

      - filter: Perform Action 
        filter variables:
        - name: airTemperature
        where:
        - variable: ObsType/airTemperature
          is_in: 180
        action:
          name: assign error
          error parameter: 2.5
        defer to post: true

ObsValue: 300.4 hofx: 296.3624 omb: 4.037627 oman: 3.947423

Increment on the model bottom

SFCSHP_singleobs_T_increment_lev65

JingCheng-NOAA commented 5 months ago

The bufr2ioda.py will generate a file with QualityMarker group of variables. However, it's been tested to prove that to use ObsFunction/ObsErrorFactorConventional for error inflation, PreQC is the only variable name that being accepted by ObsErrorFactorConventional function. Thus, the bufr2ioda.py or yaml need to be modified to make sure change the group QualityMarker into PreQC

JingCheng-NOAA commented 5 months ago

Praveen suggest to add following in the YAML file for this QualityMarker and PreQC issue temparorly.

obs prior filters:
# Create PreQC group variable
  - filter: Variable Assignment
    assignments:
    - name: PreQC/stationPressure
      type: int
      source variable: QualityMarker/stationPressure
JingCheng-NOAA commented 5 months ago

After several test, the version of obs prior filters works for the SFCSHP data is following block (and yes, JEDI is looking for MetaData/height if this prior filter is turned on ):

obs prior filters:
         - filter: Variable Assignment
           assignments:
           - name: PreQC/airTemperature
             type: int
             source variable: QualityMarker/airTemperature
           - name: MetaData/height
             type: float
             source variable: MetaData/stationElevation
JingCheng-NOAA commented 4 months ago

A comparison with GSI showed difference in hofx is ~0.29, 0.1% 10ens_fv3jedi_vs_gsi_increment_180_airTemperature_lvl64

Observation error is also off for jedi fv3jedi_vs_gsi_validation_180_airTemperature

delippi commented 4 months ago

Hi @JingCheng-NOAA, I think you are almost there. What obs operator in JEDI is used for this data type? And how is the interpolation done in GSI (e.g., 3dintrp)?

JingCheng-NOAA commented 4 months ago

Hi @JingCheng-NOAA, I think you are almost there. What obs operator in JEDI is used for this data type? And how is the interpolation done in GSI (e.g., 3dintrp)?

Hi @delippi , Thanks for checking up. In JEDI, I use the vertInterp just as you did for the Mesonet obs.

           vertical coordinate: air_pressure
           observation vertical coordinate: pressure
           observation vertical coordinate group: MetaData
           interpolation method: log-linear

in GSI as I checked in the setupt.f90, I believe it is using the tintrp2al for this type (type:180)

delippi commented 4 months ago

Hi @JingCheng-NOAA, sorry for the slow response here. If you are using tintrp2, then I think you should be using the Identity operator in JEDI. The Identity operator itself is creating hofx's which are exactly the same thing as geovals (geovals are the 2D interpolated model values to ob locations). Recall that I only modified the vertInterp to be able to use the GSI-geovals and I wasn't sure how to implement that in the Identity operator. What you could try doing just to be sure is just hard code some of the GSI-geovals into the JEDI-Identity operator.

Make sure you're using the Identity, but I think you will still expect similar differences like you have shown above unless you hard code the GSI-geovals into the Identity operator for this single ob test.

JingCheng-NOAA commented 4 months ago

@delippi , I noticed in your demonstration document, you have following section in your vertInterp operator:

             filename: "filtered_sfc_tsen_geoval_2022052619.nc4"
             levels_are_top_down: False

Is this the part you mentioned that can use the GSI-geovals in JEDI-vertInterp? Sounds like this part requires hard code in JEDI, correct? Can I see how you did it?

delippi commented 4 months ago

@JingCheng-NOAA, I tried putting the changes into a branch on ufo, but it seems I'm not able to do that. You could look under my demo directory that I had made for Hera: /scratch2/NCEPDEV/fv3-cam/Donald.E.Lippi/RRFSv2_demo/RDASApp.20240620/sorc/ufo to see the modifications that I made.

I followed what Dan did here for gnssro: https://github.com/JCSDA-internal/ufo/compare/feature/feed_gsi_geovals_to_gnssro?expand=1

It wasn't clear to me how this could be done for the Identity operator though. I'm not sure if you will be able to follow what has been done for these two operators and apply it to the Identity operator. My suggestion is to, in a single temperature ob test, put some print statments in the JEDI code RDASApp/sorc/ufo/src/ufo/operators/identity/ and maybe elsewhere to see if you can print out the temperature geovals coming from JEDI. Then you could just hardcode the GSI-geoval for that temperature ob and it should give one-to-one match. If you can show that then you can be certain that you have everything related to the operator correct and the differences are due to differences in GSI vs JEDI geovals (i.e., 4 vs 3 grid point interpolation for geovals).

JingCheng-NOAA commented 4 months ago

@JingCheng-NOAA, I tried putting the changes into a branch on ufo, but it seems I'm not able to do that. You could look under my demo directory that I had made for Hera: /scratch2/NCEPDEV/fv3-cam/Donald.E.Lippi/RRFSv2_demo/RDASApp.20240620/sorc/ufo to see the modifications that I made.

I followed what Dan did here for gnssro: https://github.com/JCSDA-internal/ufo/compare/feature/feed_gsi_geovals_to_gnssro?expand=1

It wasn't clear to me how this could be done for the Identity operator though. I'm not sure if you will be able to follow what has been done for these two operators and apply it to the Identity operator. My suggestion is to, in a single temperature ob test, put some print statments in the JEDI code RDASApp/sorc/ufo/src/ufo/operators/identity/ and maybe elsewhere to see if you can print out the temperature geovals coming from JEDI. Then you could just hardcode the GSI-geoval for that temperature ob and it should give one-to-one match. If you can show that then you can be certain that you have everything related to the operator correct and the differences are due to differences in GSI vs JEDI geovals (i.e., 4 vs 3 grid point interpolation for geovals).

Thank you! I will give it a try.

HuiLiu-NOAA commented 4 months ago

@JingCheng-NOAA : another alternative and easier approach might be to change the location of the OBS to somewhere small-scale feature may be smaller, e.g., the interior of Atlantic or the center of Gulf of Mexican. At these locations the impact due to the different JEDI/GSI horizontal interpolations would be smaller, so the hofx from JEDI and GSI could be closer.

JingCheng-NOAA commented 4 months ago

Switched the observation operator from VertInterp to Identity as suggested by @delippi. Before modifying JEDI code to add in the GeoVal, I did launched a test to see if there is any change in the hofx with Identity obs operator. Identity_fv3jedi_vs_gsi_increment_180_airTemperature_lvl64 It seems like both obs operators are calculating the same hofx.

JingCheng-NOAA commented 4 months ago

Another test is made with @HuiLiu-NOAA 's suggestion to move the single observation from near land to the center of Gulf of Mexican. In this test, observation operator is VertInterp The result seems better, the hofx created by JEDI are similar to what GSI created. MidOcean_fv3jedi_vs_gsi_increment_180_airTemperature_lvl64 More work need to be done on the observation error, though MidOcean_fv3jedi_vs_gsi_validation_180_airTemperature

HuiLiu-NOAA commented 4 months ago

@JingCheng-NOAA : nice test. I guess next step is to figure out why the final OBS error is changed. May be helpful to print out the initial OBS error too.

JingCheng-NOAA commented 4 months ago

@JingCheng-NOAA : nice test. I guess next step is to figure out why the final OBS error is changed. May be helpful to print out the initial OBS error too.

Thank you! I will start from checking the initial OBS error first.

JingCheng-NOAA commented 4 months ago

The initial error assigned in JEDI yaml file was 2.5. Changed it into the same number 0.9729 that GSI was using (read from errtable). The latest results show a better agreement on final observation error. initialerr_fv3jedi_vs_gsi_increment_180_airTemperature_lvl64 initalerr_fv3jedi_vs_gsi_validation_180_airTemperature

HuiLiu-NOAA commented 4 months ago

@JingCheng-NOAA : excellent result!

delippi commented 4 months ago

Hi @JingCheng-NOAA, thanks for doing these tests. The plot you showed where you switched to Identity operator over Louisiana I think shows that is the correct operator to use in JEDI and that also aligns with tintrp2* used in GSI for this data type. I would expect this 0.1% hofx diff due to differences in how JEDI and GSI compute geovals (i.e., there is a slight difference in tintrp2* and Identity in the number of grid points used for interpolation).

You then show a figure where you use the vertInterp again but do it over the middle of the Gulf of Mexico; can you do that same test but with Identity operator? And did you pass GSI-geovals into the test with vertInterp? If you did pass GSI-geovals into vertInterp then the hofxs should match exactly if that is the correct operator which I don't think it is. Don't worry that the Identity doesn't match exactly (unless you found a way to pass GSI-geovals into it). My worry is that this could be "too simple" of a case and vertInterp works but won't work in say your Louisiana case. I think generally it is good to do single ob tests in the more complex location to not get false positives for certain settings.

As for the oberror, you should also be able to get this to match exactly. You can hard code some GSI-geovals for model_pressure_sfc[iloc] and prsl[0][iloc] and maybe one more level prsl[1][iloc] (remember that JEDI and GSI have data in reverse order). You can make that change in the following code: ufo/filters/obsfunctions/ObsErrorFactorPressureCheck.cc

JingCheng-NOAA commented 4 months ago

Hi @JingCheng-NOAA, thanks for doing these tests. The plot you showed where you switched to Identity operator over Louisiana I think shows that is the correct operator to use in JEDI and that also aligns with tintrp2* used in GSI for this data type. I would expect this 0.1% hofx diff due to differences in how JEDI and GSI compute geovals (i.e., there is a slight difference in tintrp2* and Identity in the number of grid points used for interpolation).

You then show a figure where you use the vertInterp again but do it over the middle of the Gulf of Mexico; can you do that same test but with Identity operator? And did you pass GSI-geovals into the test with vertInterp? If you did pass GSI-geovals into vertInterp then the hofxs should match exactly if that is the correct operator which I don't think it is. Don't worry that the Identity doesn't match exactly (unless you found a way to pass GSI-geovals into it). My worry is that this could be "too simple" of a case and vertInterp works but won't work in say your Louisiana case. I think generally it is good to do single ob tests in the more complex location to not get false positives for certain settings.

As for the oberror, you should also be able to get this to match exactly. You can hard code some GSI-geovals for model_pressure_sfc[iloc] and prsl[0][iloc] and maybe one more level prsl[1][iloc] (remember that JEDI and GSI have data in reverse order). You can make that change in the following code: ufo/filters/obsfunctions/ObsErrorFactorPressureCheck.cc

Thank you Donni for the comments and suggestions. I will do another test with Identity for obs in middle of Gulf. And no I didn't pass the GSI-geovals into the test with VertInerp with the obs in new location.

JingCheng-NOAA commented 4 months ago

Use operator Identity for single observation over the middle of the Gulf of Mexico. Result is identical to the one using VertInterp. Identity_locationchanged_fv3jedi_vs_gsi_increment_180_airTemperature_lvl64

HuiLiu-NOAA commented 4 months ago

@JingCheng-NOAA : This is interesting. It may suggest that the vertical interpolation operator is just using the lowest level of the model, without any interpolation for the surface observations.

delippi commented 4 months ago

Yeah, and I just realized you DID do the vertInterp vs Identity test in over Louisiana. Do you mind pointing me to your run directory?

JingCheng-NOAA commented 4 months ago

/scratch1/NCEPDEV/hwrf/scrub/Jing.Cheng/jediwork/test/hafs-data_fv3jedi_2020082512/

The yaml file I used is SFCSHP_singleob_airTemperature_fv3jedi.yaml And the observation I used is in obsout/sfcship_tsen_obs_2020082512.nc4

Thanks for help checking!

delippi commented 4 months ago

Hi @JingCheng-NOAA, I double checked the GSI for which interpolation method is used for obtype 180. Since SFCMODEL=F, I thought GSI was probably using "SCENARIO 3b" in setupt. It appears to also use 3a when you have a virtual temperature ob (it seems there might be mixed sensible and virtual temperature obs for surface ship?). I confirmed this with print statements in GSI. Since GSI is using 3a/b, it is actually using tintrp31 (not tintrp2*) so vertInterp is actually the correct observation operator. Then, I was able to feed the GSI-geovals into vertInterp and got matching results compared to your GSI plots (for the case over the Gulf).

I would say you could try rerunning the case over LA and try feeding the GSI-geovals. Make sure you are feeding geovals like:

           gsi geovals:
             filename: "obsout/sfcship_tsen_geoval_2020082512.nc4"
             levels_are_top_down: False

In your run_fv3jedi.sh script, you can set the following to just use my version that can feed GSI-geovals into vertInterp:

jedibin="/scratch2/NCEPDEV/fv3-cam/Donald.E.Lippi/RRFSv2_demo/RDASApp.20240620/build/bin"

If you get an error about not having saturation_specific_humidity or surface_geometric_height, you can use this tool: /scratch2/NCEPDEV/fv3-cam/Donald.E.Lippi/RRFSv2/tools/update_geovals.py after you recreate your GSI-geovals for the LA case.

JingCheng-NOAA commented 4 months ago

Hi @JingCheng-NOAA, I double checked the GSI for which interpolation method is used for obtype 180. Since SFCMODEL=F, I thought GSI was probably using "SCENARIO 3b" in setupt. It appears to also use 3a when you have a virtual temperature ob (it seems there might be mixed sensible and virtual temperature obs for surface ship?). I confirmed this with print statements in GSI. Since GSI is using 3a/b, it is actually using tintrp31 (not tintrp2*) so vertInterp is actually the correct observation operator. Then, I was able to feed the GSI-geovals into vertInterp and got matching results compared to your GSI plots (for the case over the Gulf).

I would say you could try rerunning the case over LA and try feeding the GSI-geovals. Make sure you are feeding geovals like:

           gsi geovals:
             filename: "obsout/sfcship_tsen_geoval_2020082512.nc4"
             levels_are_top_down: False

In your run_fv3jedi.sh script, you can set the following to just use my version that can feed GSI-geovals into vertInterp:

jedibin="/scratch2/NCEPDEV/fv3-cam/Donald.E.Lippi/RRFSv2_demo/RDASApp.20240620/build/bin"

If you get an error about not having saturation_specific_humidity or surface_geometric_height, you can use this tool: /scratch2/NCEPDEV/fv3-cam/Donald.E.Lippi/RRFSv2/tools/update_geovals.py after you recreate your GSI-geovals for the LA case.

Thank you so much for checking it up. I will rerun this with your JEDI and see how it goes.

JingCheng-NOAA commented 4 months ago

Use GSI GeoVaLs in VertInterp using the JEDI build by Donnie show the exact same value hofx gsiGeoVal_fv3jedi_vs_gsi_increment_180_airTemperature_lvl64

delippi commented 4 months ago

Great! I think this is good enough for hofx and oberror validation purposes. The ob errors are close enough and you could show that they would match exactly if you also passed the GSI-geovals into the obserror function too. No need to go down that route.

I'm a little confused by the final increment plot. With everything else identical, JEDI has an order of magnitude larger increment AND has (barely) higher final ob error. The increments should be comparable. This was also something notable in your previous plots for the LA case as well. I wonder what is going on there.

JingCheng-NOAA commented 4 months ago

Great! I think this is good enough for hofx and oberror validation purposes. The ob errors are close enough and you could show that they would match exactly if you also passed the GSI-geovals into the obserror function too. No need to go down that route.

I'm a little confused by the final increment plot. With everything else identical, JEDI has an order of magnitude larger increment AND has (barely) higher final ob error. The increments should be comparable. This was also something notable in your previous plots for the LA case as well. I wonder what is going on there.

Thanks for all the valuable advices for the hofx valication. About the increment plot, I was curious too and compared it with my early experiment. I kind of feel that the background covariance we current using cause a large impact on the increment. In this test case I am using 10 GDAS ensemble members (interpolated onto HAFS grids) to provide background covariance. The number of the ensemble member has a large impact on the increments. Below shows the increments difference when use 3, 5, 6 ensemble members from GDAS. You can see the magnitude varies huge. Meanwhile, the bump NICAS files that provided the localization may need more tune as well.

*3 GDAS member 3ens_fv3jedi_vs_gsi_increment_180_airTemperature_lvl64

*5 GDAS member 5ens_fv3jedi_vs_gsi_increment_180_airTemperature_lvl64

*6 GDAS member 6ens_fv3jedi_vs_gsi_increment_180_airTemperature_lvl64

JingCheng-NOAA commented 3 months ago

Change test case to 2024 Beryl02L. The observation is placed near the hurricane. Adjust the localization scale in BUMP NICAS to check the impact on increment, and compared with GSI analysis.

JingCheng-NOAA commented 3 months ago

Based on the horz: 200km vert:1e2 resolution8, plot lvl from 81 to 79. Comparing with GSI results show the largest increment in GSI is on the second level, not the ground one. @ShunLiu-NOAA mentioned in HAFS DA meeting today about GSI has QC functions to do height correction for surface observations, while JEDI currently doesn't have them. Since this single observation (temperature, 187) I added is on 1000hPa, this could be the reason we saw here as JEDI has larger increment on the 1st level of model while GSI barely has any increment. I will leave this work as it is for now. As more surface level correction functions are included in JEDI, I will revisit this type of observation then. 200km_1e2vert_fv3jedi_vs_gsi_increment_187_airTemperature_lvl80 fv3jedi_vs_gsi_increment_187_airTemperature_lvl79 fv3jedi_vs_gsi_increment_187_airTemperature_lvl78 fv3jedi_vs_gsi_increment_187_airTemperature_lvl77