JCSDA-internal / ioda-converters

Various converters for getting obs data in and out of IODA
9 stars 4 forks source link

gsi-ncdiag Converter: Eliminate nrecs #244

Closed markjolah closed 4 years ago

markjolah commented 4 years ago

Currently the gsi-ncdiag converter package is producing improperly formatted NetCDF IODA files. These files are retaining a nrecs dimension and some are also retaining variables indexed by nrecs. This is a problem, as these invalid nrecs indexed variables are preserved in the HofX output IODA files, and this can lead to problems concatenating the output HofX files from each processor. Additionally, it makes it impossible to concatenate GSI observation IODA files to form longer time windows, as each window may disagree on the nrecs dimension.

To solve this issue, the gsi-ncdiag python package will need to be updated and tested with the GSI input files. I don't currently have these original files, so if someone could provide them, I maybe able to help debug this issue.

Here is a list of GSI output files from the July 2019 run @emilyhcliu has completed, and for each file any references to nrecs dimension or attributes should be removed and especially any variables indexed by nrecs should be removed. In particular some of the conventional obs are producing two different variables for Station_ID with different capitalization and indexing dimensions.

F: aircraft_obs_2019070100.nc4
    nrecs = 4333 ;
        :nrecs = 4333 ;
F: aircraft_q_obs_2019070100.nc4
    nrecs = 276 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 276 ;
F: aircraft_tsen_obs_2019070100.nc4
    nrecs = 4333 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 4333 ;
F: aircraft_uv_obs_2019070100.nc4
    nrecs = 4135 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 4135 ;
F: airs_aqua_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: amsua_aqua_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: amsua_metop-a_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: amsua_metop-b_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: amsua_n15_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: amsua_n18_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: amsua_n19_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: atms_n20_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: atms_npp_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: avhrr_metop-a_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: avhrr_n18_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: cris-fsr_n20_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: cris-fsr_npp_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: gome_metop-a_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: gome_metop-b_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: gps_bend_obs_2019070100.nc4
    nrecs = 484 ;
F: hirs4_metop-a_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: hirs4_metop-b_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: hirs4_n19_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: iasi_metop-a_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: iasi_metop-b_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: mhs_metop-a_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: mhs_metop-b_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: mhs_n19_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: omi_aura_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: ompsnp_npp_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: ompstc8_npp_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: rass_tv_obs_2019070100.nc4
    nrecs = 17 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 17 ;
F: satwind_uv_obs_2019070100.nc4
    nrecs = 20 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 20 ;
F: sbuv2_n19_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: scatwind_uv_obs_2019070100.nc4
    nrecs = 219039 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 219039 ;
F: seviri_m08_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: seviri_m11_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: sfc_obs_2019070100.nc4
    nrecs = 11199 ;
        :nrecs = 11199 ;
F: sfc_ps_obs_2019070100.nc4
    nrecs = 11199 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 11199 ;
F: sfc_q_obs_2019070100.nc4
    nrecs = 4270 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 4270 ;
F: sfcship_obs_2019070100.nc4
    nrecs = 1574 ;
        :nrecs = 1574 ;
F: sfcship_ps_obs_2019070100.nc4
    nrecs = 1574 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 1574 ;
F: sfcship_q_obs_2019070100.nc4
    nrecs = 420 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 420 ;
F: sfcship_tsen_obs_2019070100.nc4
    nrecs = 398 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 398 ;
F: sfcship_tv_obs_2019070100.nc4
    nrecs = 1105 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 1105 ;
F: sfcship_uv_obs_2019070100.nc4
    nrecs = 1447 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 1447 ;
F: sfc_tsen_obs_2019070100.nc4
    nrecs = 118 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 118 ;
F: sfc_tv_obs_2019070100.nc4
    nrecs = 11055 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 11055 ;
F: sfc_uv_obs_2019070100.nc4
    nrecs = 11021 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 11021 ;
F: sndrd1_g15_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: sndrd2_g15_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: sndrd3_g15_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: sndrd4_g15_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: sondes_obs_2019070100.nc4
    nrecs = 721 ;
        :nrecs = 721 ;
F: sondes_ps_obs_2019070100.nc4
    nrecs = 647 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 647 ;
F: sondes_q_obs_2019070100.nc4
    nrecs = 346 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 346 ;
F: sondes_tsen_obs_2019070100.nc4
    nrecs = 608 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 608 ;
F: sondes_tv_obs_2019070100.nc4
    nrecs = 647 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 647 ;
F: sondes_uv_obs_2019070100.nc4
    nrecs = 721 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 721 ;
F: ssmis_f17_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: ssmis_f18_obs_2019070100.nc4
    nrecs = 1 ;
    int rec_id@RecMetaData(nrecs) ;
        :nrecs = 1 ;
F: sst_obs_2019070100.nc4
    nrecs = 993 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 993 ;
F: vadwind_uv_obs_2019070100.nc4
    nrecs = 150 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 150 ;
F: windprof_uv_obs_2019070100.nc4
    nrecs = 3 ;
    char Station_ID@RecMetaData(nrecs, nstring) ;
        :nrecs = 3 ;
emilyhcliu commented 4 years ago

@markjolah I will look into this.

emilyhcliu commented 4 years ago

@markjolah Here is my plan: I am modifying the IODA converter to remove nrecs related things, and also add units now. I am dealing with satellite radiance data first and will send you AMSU-A from 2019070100 for testing soon. Once you give me a green light, I will re-generate the whole month for all instruments. I can generate the entire month of obs data in 1-2 days (if machines behave normally).

emilyhcliu commented 4 years ago

@markjolah @cyberbass The updated obs files for 2019070100 can be found on s4 at the following location: /data/users/eliu/AOP75/2019070100_new/obs

Things fixed in this updated data set are: (1) Removed obsolete "nrecs" dimension from obs files (2) Added units to GSI reference variables (e.g., @GsiHofX) (3) Added @GsiHofX and @GsiHofXBc for sea surface temperature in SSTS obs file

Please let me know if this new set of obs files work for jedi-rapids.

srherbener commented 4 years ago

@emilyhcliu, just want to double check that you will eventually be submitting a PR with these modifications. Thanks!

Also, how much work would it be to regenerate the GSI related obs files in the IODA test file set (April 15, 2018 dated obs files) using the new converter once it's completed? The concern is that something got changed that causes a test failure, and we don't discover what the issue is for a long time. Running jedi-rapids on the July 2019 files however should go a long way toward flushing these kinds of bugs out so it may not be absolutely necessary to update the April 15, 2018 files.

I'm thinking that if it's low effort we should go ahead and update the April 15, 2018 files, but if it's a lot of work we may want to hold off on that task. Thanks!

emilyhcliu commented 4 years ago

@srherbener I would like to update ioda-converters in two steps (PRs) The first one would be the following: (1) add missing units for some variables (2) remove obsolete variables and add new variables (3) add SST (4) update GPS and Ozone data types (currently used in operation)

The second one is to remove "nrecs" dimension: The GPSRO people told me that they are still using nrecs in GPSRO data. Do we still want to remove nrecs?

srherbener commented 4 years ago

@emilyhcliu, Thanks for bringing this to our attention. We do want to get rid of nrecs in the ioda obs files, and I think your plan is good.

The reader in IODA is ignoring nrecs in the file, and the reader itself is generating the nrecs value and associated record numbers. It's likely that we can removed nrecs and @RecMetaData variables and the GPSRO operators would not notice. I'll work this out with the GPSRO folks. We should be able to get this straightened out before your second PR.

srherbener commented 4 years ago

PRs #256, #257, #258 have resolved this issue. The original PR (#255) that this issue was tied to was closed before merging and replaced by #256 which has been merged. Therefore I'm going to close this issue now.