NOAA-EMC / RDASApp

Regional DAS
GNU Lesser General Public License v2.1
2 stars 13 forks source link

Incorporate Aircraft DA into JEDI for RRFS #106

Open delippi opened 4 months ago

delippi commented 4 months ago

The UFO in JEDI is the component that not only computes model-simulated observations, but also houses filters and methods for observation QC, ob errors, and bias correction. The GSI observer is the equivalent. Many forward operators for various observations have been developed for the UFO. These operators can be utilized in RDAS. However these operators still must be tested for RDAS. The steps for transition by observation platform are as follows:

  1. Establish IODA processing
  2. Establish operator in UFO
  3. Establish YAML input file
  4. Validate with test assimilation experiments
  5. Create ctest, if necessary

Aircraft data is bufr obtype=30, 31, 33-35

guoqing-noaa commented 4 months ago

@delippi Thanks for doing this!

FYI, @spanNOAA submitted an ioda task to rrfs-workflow earlier this year https://github.com/NOAA-EMC/rrfs-workflow/pull/199. Those yaml files may be re-used here: https://github.com/NOAA-EMC/rrfs-workflow/tree/dev-sci/parm/iodaconv

delippi commented 4 months ago

Hofx and oberrors for 130, 131, 134, and 135 (from AIRCFT) are validated. Followed the following documentation for what is assimilated in RAP: https://www.emc.ncep.noaa.gov/mmb/data_processing/prepbufr.doc/table_5.htm

The following tests were completed by feeding the GSI-geovals into vertInterp. There were a few places where the JEDI ob errors don't match the GSI ob errors, but that doesn't seem to be a worry. I didn't look into it beyond what I have shown here, but I would guess it is due to obs being closer to the surface relative to most of the other obs. I have previously shown that ob errors can be different when an ob is below the top 20 or so model levels. This is related to hybrid vcoord and pressure level geoval calculations.

ObType=130 T fv3jedi_vs_gsi_validation_130_airTemperature

ObType=131 T fv3jedi_vs_gsi_validation_131_airTemperature

ObType=134 T fv3jedi_vs_gsi_validation_134_airTemperature

ObType=135 T fv3jedi_vs_gsi_validation_135_airTemperature

ObType=134 q fv3jedi_vs_gsi_validation_134_specificHumidity

ObType=230 u fv3jedi_vs_gsi_validation_230_windEastward

ObType=231 u fv3jedi_vs_gsi_validation_231_windEastward

ObType=234 u fv3jedi_vs_gsi_validation_234_windEastward

ObType=235 u fv3jedi_vs_gsi_validation_235_windEastward

delippi commented 4 months ago

I'm trying to complete hofx and oberror validation for 133/233 (from AIRCAR), but JEDI keeps hanging. I've found that for a few obs, an assertion does not pass in ObsErrorFactorPressureCheck.cc.

36: Assertion failed: logprsl[0] > logprsl[nlevs-1] in compute, line 299 of /scratch2/NCEPDEV/fv3-cam/Donald.E.Lippi/RRFSv2/jedi-assim-aircft/RDASApp.20240620/bundle/ufo/src/ufo/filters/obsfunctions/ObsErrorFactorPressureCheck.cc

I added some print statements before this assertion to print when it is expected to fail and this is the result:

36: iloc               =0
36: logprsl[0]         =-nan
36: logprsl[nlevs-1]   =-nan
36: prsl[0][iloc]      =-3.3688e+38
36: prsl[nlevs-1][iloc]=-3.3688e+38

It is also interesting to note that it fails only when iloc=0, but not every iloc=0 fails. There are at least 7 instances of this in the log file.

delippi commented 4 months ago

@delippi Thanks for doing this!

FYI, @spanNOAA submitted an ioda task to rrfs-workflow earlier this year NOAA-EMC/rrfs-workflow#199. Those yaml files may be re-used here: https://github.com/NOAA-EMC/rrfs-workflow/tree/dev-sci/parm/iodaconv

Hi @guoqing-noaa & @spanNOAA, Thanks for sharing this info. I'm a little confused though. I see only information like XDR, YDR, HRDR for lat, lon, and time metadata in the yaml provided, but in my bufr file (using readmp which is a bufr tool), I only see XOB, YOB, and DHR that information. And when I try using the provided yaml, all the space/time metadata is missing and results in "no obs assimilated". The DR mnemonics is used for drift like radiosondes and I see something mentioned for aircraft profiles. But the prepbufr obs that I have don't seem like aircraft profiles since they don't have the DR values. Do either of you have any idea about any of this? Thank you!

For reference, here is some output from readmp on AIRCFT (prepbufr).

[hfe01 mpas_2024052700] readmp AIRCFT 
 message date=  2024052700

MESSAGE TYPE AIRCFT    

001194  SID                        FPY9B  ( 8)CCITT IA5                 STATION IDENTIFICATION                          
006240  XOB                    348.30000  DEG E                         LONGITUDE                                       
005002  YOB                     55.13000  DEG N                         LATITUDE                                        
004215  DHR                     -1.00000  HOURS                         OBSERVATION TIME MINUS CYCLE TIME               
010199  ELV                      10973.0  METER                         STATION ELEVATION                               
055007  TYP                        130.0  CODE TABLE                    PREPBUFR REPORT TYPE                            
055008  T29                         41.0  CODE TABLE                    DATA DUMP REPORT TYPE                           
055009  TSB                          0.0  CODE TABLE                    REPORT SUBTYPE (HAS VARIOUS MEANINGS DEPENDING O
002195  ITP                      MISSING  CODE TABLE                    INSTRUMENT TYPE                                 
050001  SQN                       1164.0  NUMERIC                       REPORT SEQUENCE NUMBER                          
050003  PROCN                        0.0  NUMERIC                       PROCESS NUMBER FOR THIS MPI RUN (OBTAINED FROM S
004214  RPT                     23.00000  HOURS                         REPORTED OBSERVATION TIME                       
004216  TCOR                         0.0  CODE TABLE                    INDICATOR WHETHER OBS. TIME IN "DHR" WAS CORRECT
           <RSRD_SEQ>     0 REPLICATIONS
           <ACID_SEQ>     0 REPLICATIONS
           {PRSLEVLA}     1 REPLICATIONS
004217  RCT                        23.03  HOURS                         RECEIPT TIME                                    
002199  ROLF                     MISSING  CODE TABLE                    AIRCRAFT ROLL ANGLE FLAG                        
033026  MSTQ                     MISSING  CODE TABLE                    MOISTURE QUALITY                                
010082  IALR                     MISSING  M/S                           INSTANTANEOUS ALTITUDE RATE                     
008193  CAT                          6.0  CODE TABLE                    PREPBUFR DATA LEVEL CATEGORY                    
           <P___INFO>     1 REPLICATIONS
           [P__EVENT]     1 REPLICATIONS
007245  POB                        227.3  MB                            PRESSURE OBSERVATION                            
007246  PQM                          2.0  CODE TABLE                    PRESSURE (QUALITY) MARKER                       
007247  PPC                          1.0  CODE TABLE                    PRESSURE EVENT PROGRAM CODE                     
007248  PRC                      MISSING  CODE TABLE                    PRESSURE EVENT REASON CODE                      
           <P__BACKG>     0 REPLICATIONS
           <P__POSTP>     0 REPLICATIONS
           <Q___INFO>     1 REPLICATIONS
           [Q__EVENT]     0 REPLICATIONS
012244  TDO                      MISSING  DEG C                         DEWPOINT TEMPERATURE OBSERVATION (NOT ASSIMILATE
           <Q__BACKG>     1 REPLICATIONS
013250  QOE                      MISSING  PERCENT DIVIDED BY 10         RELATIVE HUMIDITY OBSERVATION ERROR             
013249  QFC                         15.0  MG/KG                         FORECAST (BACKGROUND) SPECIFIC HUMIDITY VALUE   
           <QFC__MSQ>     0 REPLICATIONS
           <Q__POSTP>     0 REPLICATIONS
           <T___INFO>     1 REPLICATIONS
           [T__EVENT]     2 REPLICATIONS
    ++++++  T__EVENT  REPLICATION #     1  ++++++
012245  TOB                        -45.0  DEG C                         TEMPERATURE OBSERVATION                         
012246  TQM                          1.0  CODE TABLE                    TEMPERATURE (QUALITY) MARKER                    
012247  TPC                         15.0  CODE TABLE                    TEMPERATURE EVENT PROGRAM CODE                  
012248  TRC                        531.0  CODE TABLE                    TEMPERATURE EVENT REASON CODE                   
    ++++++  T__EVENT  REPLICATION #     2  ++++++
012245  TOB                        -45.0  DEG C                         TEMPERATURE OBSERVATION                         
012246  TQM                          2.0  CODE TABLE                    TEMPERATURE (QUALITY) MARKER                    
012247  TPC                          1.0  CODE TABLE                    TEMPERATURE EVENT PROGRAM CODE                  
012248  TRC                      MISSING  CODE TABLE                    TEMPERATURE EVENT REASON CODE                   
012243  TVO                      MISSING  DEG C                         NON-Q. CONTROLLED VIRTUAL TEMP OBS (NOT ASSIMILA
           <T__BACKG>     1 REPLICATIONS
012250  TOE                      MISSING  DEG C                         TEMPERATURE OBSERVATION ERROR                   
012249  TFC                        -46.0  DEG C                         FORECAST (BACKGROUND) TEMPERATURE VALUE         
           <TFC__MSQ>     0 REPLICATIONS
           <T__POSTP>     0 REPLICATIONS
           <Z___INFO>     1 REPLICATIONS
           [Z__EVENT]     1 REPLICATIONS
010007  ZOB                      10973.0  METER                         HEIGHT OBSERVATION                              
010246  ZQM                          2.0  CODE TABLE                    HEIGHT (QUALITY) MARKER                         
010247  ZPC                          1.0  CODE TABLE                    HEIGHT EVENT PROGRAM CODE                       
010248  ZRC                      MISSING  CODE TABLE                    HEIGHT EVENT REASON CODE                        
           <Z__BACKG>     1 REPLICATIONS
010249  ZFC                      10902.0  METER                         FORECAST (BACKGROUND) HEIGHT VALUE              
           <ZFC__MSQ>     0 REPLICATIONS
           <Z__POSTP>     0 REPLICATIONS
           <W___INFO>     0 REPLICATIONS
           <DRFTINFO>     0 REPLICATIONS
           [W1_EVENT]     0 REPLICATIONS
           <ACFT_SEQ>     0 REPLICATIONS
           <TURB1SEQ>     0 REPLICATIONS
           <TURB2SEQ>     0 REPLICATIONS
           {TURB3SEQ}     0 REPLICATIONS
           {PREWXSEQ}     0 REPLICATIONS
           {CLOUDSEQ}     0 REPLICATIONS
           {AFIC_SEQ}     0 REPLICATIONS
033249  NRLQMS               ......MMM.L  (11)CCITT IA5                 NRL AIRCRAFT QUALITY CNTRL MARK (ADDED BY PGM PR
           <LATCORSQ>     0 REPLICATIONS
           <LONCORSQ>     0 REPLICATIONS

 >>> END OF SUBSET <<<
spanNOAA commented 4 months ago

@delippi You are right, aircraft observations don't seem like sounding observations. Probably we should use XOB and YOB like you said. I'll further look into it and fix it soon.

spanNOAA commented 4 months ago

Hi @delippi , you can modify the yaml file according to the following figure.

Screenshot 2024-07-25 at 2 46 37 PM

By making this adjustment, you can obtain a converted observation similar to the following. image image image

guoqing-noaa commented 4 months ago

Thank @delippi for testing and @spanNOAA for fixing it. @spanNOAA made a PR https://github.com/rrfs2/rrfs-workflow/pull/79 to correct it in the workflow.

delippi commented 3 months ago

I'm trying to complete hofx and oberror validation for 133/233 (from AIRCAR), but JEDI keeps hanging. I've found that for a few obs, an assertion does not pass in ObsErrorFactorPressureCheck.cc.

36: Assertion failed: logprsl[0] > logprsl[nlevs-1] in compute, line 299 of /scratch2/NCEPDEV/fv3-cam/Donald.E.Lippi/RRFSv2/jedi-assim-aircft/RDASApp.20240620/bundle/ufo/src/ufo/filters/obsfunctions/ObsErrorFactorPressureCheck.cc

I added some print statements before this assertion to print when it is expected to fail and this is the result:

36: iloc               =0
36: logprsl[0]         =-nan
36: logprsl[nlevs-1]   =-nan
36: prsl[0][iloc]      =-3.3688e+38
36: prsl[nlevs-1][iloc]=-3.3688e+38

It is also interesting to note that it fails only when iloc=0, but not every iloc=0 fails. There are at least 7 instances of this in the log file.

I turned off the ObsErrorFactorPressureCheck which allows JEDI to run through without failing. Then I checked the geovals computed from JEDI from processor 36 and iloc=0,1 (figure below). Again this issue is only appearing when iloc=0 and only for a few instances of iloc=0. The problem exists for all geoval variables for this processor at iloc=0; in this case, that is both air_temperature and air_pressure. I don't see anything wrong with the IODA observation file. Could this be an indexing issue when assigning obs to processors? geovals

delippi commented 3 months ago

I've made some progress on the missing value issue. First, in https://github.com/JCSDA-internal/ufo/issues/3422 I showed that when changing the observation distribution that that iloc=0 was just a coincidence. The problem was only occurring within the first 80 obs (out of some 23,000) and ob 1 was going on processor 1 as the first ob, ob 2 on processor 2 as the first ob, etc. And with only 80 processors is why iloc=0 seemed to have been the problem.

The problem seems to be with the observation location. I did a combination of python and @CoryMartin-NOAA's neat "ncdump to text, edit, and ncgen" method. Here is what I did, I wrote some python to select an arbitrary list of ob indices from the GSI-IODA (and GSI-geovals) and save them to a new file. I selected ob numbers 6 and 7. (I chose these numbers because iloc=6 was know to produce the missing values and iloc=7 was known to not have this problem; I reconfirmed this when running only these two obs). Then I started modifying the ncdump text file value by value and narrowed it down to the observation longitude.

Here is the original lon values for 6 and 7 respectively: longitude = 235.157, 235.468 ; Simply changing the first to match the second works now!

These obs all come from the GSI-IODA so they shouldn't be outside the domain. Is there a domain check that I should try to use? Any ideas would be appreciated! @ShunLiu-NOAA, @SamuelDegelia-NOAA, @guoqing-noaa, @HuiLiu-NOAA, @xyzemc, @TingLei-NOAA

CoryMartin-NOAA commented 3 months ago

Perhaps they are outside of the domain according to JEDI? Are they close to the edge?

delippi commented 3 months ago

Thanks @CoryMartin-NOAA, yep! looks like it is outside the domain for JEDI. The red dot is the original lat/lon. The gray shaded is the domain. Capture

SamuelDegelia-NOAA commented 3 months ago

Nice find @delippi! Does the background check filter work for removing these obs near the boundary? Or does the Assertion failed error occur before that filter can go through?

guoqing-noaa commented 3 months ago

@SamuelDegelia-NOAA would it be possible that you extend your offline LETKF domain check tool to be more general so that we can use it to remove all observations outside our DA domain? JEDI has been working on adding a model domain check filter but we don't know when that will be ready to use.

delippi commented 3 months ago

@SamuelDegelia-NOAA, the background check doesn't seem to help. Can you please share your domain check tool? That is probably the best way to go, for now.

SamuelDegelia-NOAA commented 3 months ago

@delippi and @guoqing-noaa I do not think the simple domain check I have been using for LETKF would work here. It only takes the max/min coordinates of the domain and throws out anything outside that. So the check is mainly used to throw out obs very far away that would not be included in any halo regions. Since the minimum domain longitude would be smaller than this plotted ob (since the domain extends further west), it would not end up throwing this ob away.

But this does suggest the usage for a smarter offline domain check. I will see if there is anything I can come up with using python.

CoryMartin-NOAA commented 3 months ago

if the H(x) is a missing value, is it not assimilated? If that is the case, then what is the harm of having these here? What we need (if we don't already) is have a reducedObsSpace that is the result of only non-missing H(x) values.

delippi commented 3 months ago

@CoryMartin-NOAA, right, if the H(x) is missing it is not assimilated, but the problem is when I have ObsErrorFactorPressureCheck (for oberror inflation) turned on, the geovals that that function uses are missing which gets caught in Assertion failed: logprsl[0] > logprsl[nlevs-1]

I agree, it seems like a reducedObsSpace would be what we want. What is the status on that?

CoryMartin-NOAA commented 3 months ago

Reduced obs space works, but I'm not sure if it works in this case. Regardless, sounds more like the ObsErrorFactorPressureCheck should check for missing values and return if that location is missing a value.

SamuelDegelia-NOAA commented 3 months ago

@CoryMartin-NOAA Something I have been wondering for a little bit - do you know how or if the filter order matters? As in, would having the background check filter first (with the ReduceObsSpace filter action) have any impact?

CoryMartin-NOAA commented 3 months ago

The only order that matters is pre, prior, post. Other than that it's the order it is read in (I think). https://jointcenterforsatellitedataassimilation-jedi-docs.readthedocs-hosted.com/en/latest/inside/jedi-components/ufo/qcfilters/introduction.html#order-of-filter-application

SamuelDegelia-NOAA commented 3 months ago

Okay, thank you very much for sharing. Will bookmark this page.

delippi commented 3 months ago

@CoryMartin-NOAA, do you have any examples of the reducedObsSpace function? I want to try that first. Then I will see about adding a check in ObsErrorFactorPressureCheck.

CoryMartin-NOAA commented 3 months ago

@CoryMartin-NOAA, do you have any examples of the reducedObsSpace function? I want to try that first. Then I will see about adding a check in ObsErrorFactorPressureCheck.

I'll have to defer to @emilyhcliu or @TravisElless-NOAA on this one

SamuelDegelia-NOAA commented 3 months ago

Here is an example of using reduce obs space with the gaussian thinning filter. I guess the idea would be to try using this filter action with the background check?

CoryMartin-NOAA commented 3 months ago

@SamuelDegelia-NOAA I don't think that will currently work, but it's worth a shot. I think it only will do a reduced obs space before GeoVaLs (at the moment)

SamuelDegelia-NOAA commented 3 months ago

Ah you're right @CoryMartin-NOAA, it can only be used for pre-filters

SamuelDegelia-NOAA commented 3 months ago

Perhaps the LAMDomainCheck could be used for your FV3-LAM tests with the reduce obs space filter action. An example of that is shown for the Bounds Check filter in RDASApp/sorc/fv3-jedi/test/testinput/3dvar_lam_cmaq.yaml.

delippi commented 3 months ago

I wrote my own offline domain check to test the data. It seems to have been a success. The domain check also seems to toss a little more than I was expecting, but might not necessarily be wrong. Original ob counts were closer to 23,000! Nonetheless, after using this domain check the results are as expected. I assume the slight oberror differences are simply due to GSI vs JEDI geovals and are not of concern to me. fv3jedi_vs_gsi_validation_133_airTemperature

emilyhcliu commented 3 months ago

@CoryMartin-NOAA, right, if the H(x) is missing it is not assimilated, but the problem is when I have ObsErrorFactorPressureCheck (for oberror inflation) turned on, the geovals that that function uses are missing which gets caught in Assertion failed: logprsl[0] > logprsl[nlevs-1]

I agree, it seems like a reducedObsSpace would be what we want. What is the status on that?

The reduce obs space capability is already in UFO develop. It can only be used with pre filters.
Add reduce obs space under action for a pre filter.

Example: 20240808 JEDI T2O (1)

xyzemc commented 3 months ago

I use the following filter for check if the obs is outside of the MPAS domain:

Regional Domain Check

  • filter: Bounds Check fileter variables:
  • name: brightnessTemperature test variables:
  • name: LAMDomainCheck@ObsFunction options: map_projection: circle save: true cenlat: 38.5 cenlon: -97.5 radius: 2500 #km minvalue: 1.0

But this is a temporary solution before Hui's code work.

delippi commented 3 months ago

Finished validation of obtype *33. The problem was observations outside of the domain (even when using GSI-IODA obs). For now, I'm just using an offline domain check filter since I haven't been able to get any of the suggestions above to work yet.

The domain check creates a convex hull (red box) approximating the domain (blue) to check for points within. It only takes about 15 seconds for 23k obs. More complex shapes could be achieved, but would require python tools that are not currently installed (newer version of shapely, for example). domain_check

Here are the remaining *33 validation plots. The final ob errors seem to not match as well as other aircraft types, however, there are many more obs and so also many more obs at lower levels which is known to produce differences due to how geovals are created between GSI and JEDI. Only GSI-geovals are fed into vertInterp in these tests, the ob error inflation function still uses JEDI-geovals.

fv3jedi_vs_gsi_validation_133_airTemperature fv3jedi_vs_gsi_validation_133_specificHumidity fv3jedi_vs_gsi_validation_233_windEastward

delippi commented 3 months ago

@guoqing-noaa, yes it was very straight forward to adapt my domain check to MPAS grid. The blue box is the actual grid lat/lons where the red convex hull is an approximation of the blue box and meant to be slightly within the blue box (and can be tuned). @SamuelDegelia-NOAA also has an offline domain check that we should compare with. It also seems there are others working on an offline domain check too? This is just a simple solution and should at least get the job done for testing purposes, at least. domain_check_mpas

guoqing-noaa commented 3 months ago

@delippi Super!

Yes, it will be good if we can combine those offline tools.

Just curious about the difference between your plotted domain with the one plotted by @spanNOAA (is it because of the map projection?):

image

delippi commented 3 months ago

@guoqing-noaa, thanks for showing @spanNOAA's plot. It could be map projection its hard to tell. With cartopy, I used projection=ccrs.PlateCarree(central_longitude=0). Do you happen to know what was used for this plot? I can try to reproduce it with that projection on confirm.

delippi commented 3 months ago

@guoqing-noaa, I think I got it. It seems like projection=ccrs.LambertConformal() was used. Here is the result with that. Differences are just map projection although some of the boundary extents look a little different. The way I have some thing plotted look strange with this projection.

domain_check_mpas_LC

Edit: I double checked the min/max lat/lons and added grid lines to my plot.

Max/Min Lat: 61.212459564208984, 19.301267623901367
Max/Min Lon: -48.77972412109375, -146.12791442871094
spanNOAA commented 3 months ago

This seems unusual to me. I'm also using the Lambert projection and am not seeing any blank areas within the domain. Which plotting function did you use?

delippi commented 3 months ago

@spanNOAA, don't worry about the blank areas. That was because I was plotting the domain as bunch of lines instead of filling it. I've since fixed it, but still have a strange angle along southern boundary. Possibly an artifact in how I'm plotting, but I'm not sure.

I'm more concerned about the extent of the domain edges. What you have plotted looks more like the FV3-JEDI case. Here is an MPAS and FV3 examples.

domain_check_MPAS_mesonet

domain_check_FV3

guoqing-noaa commented 3 months ago

@delippi For the MPAS domain, the north line is much smooth than the south line. Is it possible to make the south line smooth as the north line? I am also curious about why we don't let the domain boundary overlaps with the concave boundary?

spanNOAA commented 3 months ago

The plot I created shows an MPAS domain generated by Chunhua. The unusual angle along the southern boundary might be related to the plotting method you used? I used a dot plot to visualize the domain.

delippi commented 3 months ago

@delippi For the MPAS domain, the north line is much smooth than the south line. Is it possible to make the south line smooth as the north line?

I'm not sure at the moment why it is doing that. I will look into it more. Probably this:

The unusual angle along the southern boundary might be related to the plotting method you used?

I am also curious about why we don't let the domain boundary overlaps with the concave boundary?

If I understand the question correctly, the best way (I think) to check if an ob is within a domain is to use a hull (the red box) which I've seen explained as "A Convex Hull is the smallest convex shape (or polygon) that can enclose a set of points in a plane (or in higher dimensions). Imagine stretching a rubber band around the outermost points in a set; the shape that the rubber band forms is the convex hull." So, if there are any concave points between vertices, then there would be whitespace between the red and blue box. I've shrunk the convex hull such that there wouldn't be such whitespace which, of course, in tern means that it is going to be not an exact match of the domain grid. I believe there are more sophisticated ways to get a more accurate concave hull than this, but I think this was the best we could do with the python modules currently available. I think @SamuelDegelia-NOAA's tools uses concave_hull but he had to load his own conda environment to get to work.

guoqing-noaa commented 3 months ago

@HuiLiu-NOAA Does your PR https://github.com/JCSDA-internal/mpas-jedi/pull/976 eliminate the need to use an offline regional domain check?

HuiLiu-NOAA commented 3 months ago

I think the mpas-jedi team is in a better position to answer this question.

SamuelDegelia-NOAA commented 3 months ago

Great work on the domain check @delippi. If your method can use standard python modules loaded in RDASApp then it is likely a better route than mine. Is it easy to adjust how close the red hull is to the model domain edges? If so it might be worth trying to see how close we can get it without getting any bad obs through

delippi commented 3 months ago

While the domain check is/should be in JEDI already, I still think its good to have this offline tool available.

@SamuelDegelia-NOAA, thanks. Yes, my version uses only the python modules available in RDASApp. The actual checking for obs in/out of the domain is really quick. Like 1s for the 500k mesonet ob, but it still takes about 7.5 min to copy the sliced data. I believe yours might have been quicker? We should compare and maybe use your method for that part. It is very easy to shrink the red hull edges and get as close as possible to the edges without letting any bad obs in. Right now I just have it scaled down to shrink by 4% which was me eyeballing it. When I switched the plot to the LambertConformal projection it appears to be contained further in the blue domain edges. I think the LC projection is a little misleading as there is clearly green dot outside of the red boundary (look at the west edge of the last image shared). I think this is a more accurate image. This image uses the 4% shrink so the east and west concave blue edges touch the red edges.

domain_check_FV3

SamuelDegelia-NOAA commented 3 months ago

@delippi, that is great info. I agree that this tool will be useful to have.

My domain check for the ~500k mesonet obs takes 1) 48 sec to check if each ob is in the domain, and 2) 3 sec to write out the new file. The writing is done by creating a new netcdf file and then copying each group/variable into the new file but only selecting the indices for obs that are inside the domain. If we want to incorporate my part from (2) into your domain check, then you could look at /scratch1/NCEPDEV/da/Samuel.Degelia/jedi_tools/offline_domaincheck_flex.py starting at line 60.

delippi commented 3 months ago

@SamuelDegelia-NOAA, I got them combined! For the 500k mesonet obfile it took: 3 s to find obs within domain, 5 s to write out file, 33s to create plot. Run times vary a little bit each time its run. Pretty nice! Thanks for sharing your method for creating the new obfile! It still needs a little cleaned up, but it works great! /scratch2/NCEPDEV/fv3-cam/Donald.E.Lippi/RRFSv2/tools/filter_domain_obs.py Run like:

module use /scratch2/NCEPDEV/fv3-cam/Donald.E.Lippi/RRFSv2_demo/RDASApp.20240620/modulefiles
module load EVA/hera
python filter_domain_obs.py <obs_filename>

Note that I have a custom time function that you also need in the run directory or PYTHONPATH or just remove those lines.

SamuelDegelia-NOAA commented 3 months ago

This is great, glad things are working so quickly. This will be a great tool to have!

HuiLiu-NOAA commented 2 months ago

hi, @delippi : Would this offline python tool generic enough to work for the HAFS domain as well? If so, it might help for the AMV test case by @XuLu-NOAA, @JingCheng-NOAA for the moment.

delippi commented 1 month ago

Phase 2 Validation (no reliance on any GSI-derived IODA or GeoVaLs) initial results.

Methodology:

  1. Run bufr2ioda:

    • Use the updated bufr2ioda yamls to create ioda_${obtype}.nc (and domain check versions) of the GSI prepbufr data.
  2. Run GSI:

    • Split_by_subset prepbufr
    • Edit convinfo to insure only one ob type is assimilated at a time (e.g., comment out all lines that aren't for mesonet 188 temperature).
    • Edit gsiparm.anl to ensure that only one ob type is assimilated (e.g., replace prepbufr with MSONET to assimilate mesonet data).
    • Repeat steps for every ob type
  3. Run FV3-JEDI:

    • Use domain checked IODA files
    • Use validated yamls (with slight modifications)
    • Repeat for every ob type

Summary: For the most part, the increments and observation counts look reasonable! I do need to figure out why the uv_233 plot has such large increments. More to look into there. Overall, not bad for initial testing and almost no modifications from what is currently in RDASApp!

fv3jedi_vs_gsi_increment_airTemperature_aircar_airTemperature_133 fv3jedi_vs_gsi_increment_airTemperature_aircar_specificHumidity_133 fv3jedi_vs_gsi_increment_airTemperature_aircar_uv_233 fv3jedi_vs_gsi_increment_airTemperature_aircft_airTemperature_134 fv3jedi_vs_gsi_increment_airTemperature_aircft_airTemperature_135 fv3jedi_vs_gsi_increment_airTemperature_aircft_specificHumidity_134 fv3jedi_vs_gsi_increment_airTemperature_aircft_uv_230 fv3jedi_vs_gsi_increment_airTemperature_aircft_uv_231 fv3jedi_vs_gsi_increment_airTemperature_aircft_uv_234 fv3jedi_vs_gsi_increment_airTemperature_aircft_uv_235 fv3jedi_vs_gsi_increment_airTemperature_aircft_airTemperature_130 fv3jedi_vs_gsi_increment_airTemperature_aircft_airTemperature_131

delippi commented 6 days ago

Phase 2 Aircar obtype=33 update Following some updates from PR#188 (msonet) that I've included here, we get some slightly better results than before, but the winds 233 still have too large of increments.

airTemperature fv3jedi_vs_gsi_increment_airTemperature_aircar_airTemperature_133 hist_aircar_airTemperature_133

specificHumidity fv3jedi_vs_gsi_increment_airTemperature_aircar_specificHumidity_133 hist_aircar_specificHumidity_133

winds fv3jedi_vs_gsi_increment_airTemperature_aircar_winds_233 hist_aircar_winds_233