add the amsua_n19_qc_bc.yaml into templates

xyzemc commented 1 month ago

This PR is for commit the work have been done in issue#180

The ctest has been passed with this new yaml file:

ctest -R rrfs_mpasjedi_2024052700_Ens3Dvar Test project /scratch2/NCEPDEV/fv3-cam/Xiaoyan.Zhang/noscrub/JEDI/RDASApp_09272024/build/rrfs-test Start 3: rrfs_mpasjedi_2024052700_Ens3Dvar 1/1 Test #3: rrfs_mpasjedi_2024052700_Ens3Dvar ... Passed 122.34 sec

100% tests passed, 0 tests failed out of 1

Label Time Summary: mpi = 122.34 secproc (1 test) rdas-bundle = 122.34 secproc (1 test) script = 122.34 sec*proc (1 test)

Total Test time (real) = 123.18 sec

SamuelDegelia-NOAA commented 1 month ago

@xyzemc I added a new wiki page detailing additional changes to be made when updating the ctests for a new observation type. Please see here: https://github.com/NOAA-EMC/RDASApp/wiki/Adding-new-observation-types-for-the-RDASApp-ctests

As part of your PR, could you also add the replaceable strings to the yaml, update/run gen_yaml_ctest.sh, and regenerate the test reference data? Please let me know if you have any issues with this - I would like adding new observation types to the ctests to be easy but this might be an iterative process to make it so.

SamuelDegelia-NOAA commented 1 month ago

Another idea would be to let users add their individual yamls files and then we could do a later PR to update the ctests for those new yamls. That could be easier such that only a code manager would need to run/update the gen_yaml_ctest.sh script. We could also wait until we have a few new yaml types added to update the ctests with. Any thoughts on this @guoqing-noaa or @delippi?

delippi commented 1 month ago

Another idea would be to let users add their individual yamls files and then we could do a later PR to update the ctests for those new yamls. That could be easier such that only a code manager would need to run/update the gen_yaml_ctest.sh script. We could also wait until we have a few new yaml types added to update the ctests with. Any thoughts on this @guoqing-noaa or @delippi?

@SamuelDegelia-NOAA, it seems okay to me to do it this way.

SamuelDegelia-NOAA commented 1 month ago

@xyzemc You can hold off on what I discussed in my last message - we can just plan to update the ctests at a later time. So it should be okay to just add your yaml as is once @delippi's questions are addressed.

xyzemc commented 1 month ago

@xyzemc You can hold off on what I discussed in my last message - we can just plan to update the ctests at a later time. So it should be okay to just add your yaml as is once @delippi's questions are addressed.

@SamuelDegelia-NOAA Sounds good to me. Could you please put the data on role account first. The data location on my account is: /scratch2/NCEPDEV/fv3-cam/Xiaoyan.Zhang/noscrub/JEDI/RDASApp_09272024/expr/mpas_2024052700/data/obs/amsua_n19_obs

SamuelDegelia-NOAA commented 1 month ago

@SamuelDegelia-NOAA Sounds good to me. Could you please put the data on role account first. The data location on my account is: /scratch2/NCEPDEV/fv3-cam/Xiaoyan.Zhang/noscrub/JEDI/RDASApp_09272024/expr/mpas_2024052700/data/obs/amsua_n19_obs

Done!

xyzemc commented 1 month ago

@xyzemc, I just have a few questions:

It seems like you only have MPAS-JEDI results. Have you or do you plan to compare results in FV3-JEDI vs GSI? Both hofx validation where you provide FV3-JEDI with GSI-IODA (this is what I refer to as Phase 1) and making sure all the qc parts work as expected without using GSI-IODA (this is what I refer to as Phase 2). Either way, this is great work and we can definitely add this to the "validated_yamls" even without those tests done yet--since almost all of those yamls are still a work in progress.

Just a minor detail. The file name of this yaml is amsua_n19_qc_bc.yaml. Would it make sense to remove the _qc_bc part and match the name of the obs space in the yaml?

Another minor consistency detail. For conventional obs, we have been using the naming convention for the ioda data as like ioda_msonet.nc--so use something like ioda_amsua_n19.nc. That shouldn't be too hard to change here, but of course you would have to change the name in the converter step too.

Is the hofx file supposed to have rass_ prefixed to it?

@delippi Answers for each question are:

I haven't compared FV3-JEDI with GSI yet for amsua and atms. But will do that in the near future.
I have change the yaml file name as you suggested.
obs name changed to ioda_amsua_n19.nc
I think 'rass_' prefix is what I copied from GDASApp yaml file. But I can definitely remove the prefix.

xyzemc commented 1 month ago

@xyzemc You can hold off on what I discussed in my last message - we can just plan to update the ctests at a later time. So it should be okay to just add your yaml as is once @delippi's questions are addressed.

I have done everything that guided in the wiki and addressed @delippi 's question with modification. The new ctest results with the super yaml file is:

`ctest -R rrfs_mpasjedi_2024052700_Ens3Dvar Test project /scratch2/NCEPDEV/fv3-cam/Xiaoyan.Zhang/noscrub/JEDI/RDASApp_09272024/build/rrfs-test Start 3: rrfs_mpasjedi_2024052700_Ens3Dvar 1/1 Test #3: rrfs_mpasjedi_2024052700_Ens3Dvar ... Passed 337.52 sec

100% tests passed, 0 tests failed out of 1

Label Time Summary: mpi = 337.52 secproc (1 test) rdas-bundle = 337.52 secproc (1 test) script = 337.52 sec*proc (1 test)

Total Test time (real) = 338.51 sec `

delippi commented 1 month ago

@xyzemc, I just have a few questions:

It seems like you only have MPAS-JEDI results. Have you or do you plan to compare results in FV3-JEDI vs GSI? Both hofx validation where you provide FV3-JEDI with GSI-IODA (this is what I refer to as Phase 1) and making sure all the qc parts work as expected without using GSI-IODA (this is what I refer to as Phase 2). Either way, this is great work and we can definitely add this to the "validated_yamls" even without those tests done yet--since almost all of those yamls are still a work in progress.

Just a minor detail. The file name of this yaml is amsua_n19_qc_bc.yaml. Would it make sense to remove the _qc_bc part and match the name of the obs space in the yaml?

Another minor consistency detail. For conventional obs, we have been using the naming convention for the ioda data as like ioda_msonet.nc--so use something like ioda_amsua_n19.nc. That shouldn't be too hard to change here, but of course you would have to change the name in the converter step too.

Is the hofx file supposed to have rass_ prefixed to it?

@delippi Answers for each question are:

I haven't compared FV3-JEDI with GSI yet for amsua and atms. But will do that in the near future.

I have change the yaml file name as you suggested.

obs name changed to ioda_amsua_n19.nc

I think 'rass_' prefix is what I copied from GDASApp yaml file. But I can definitely remove the prefix.

@xyzemc Thanks for taking those minor comments into consideration! It looks like you forgot to remove the amsua_n19_qc_bc.yaml in this PR. There are two yamls in the "Files Changed" tab.

Sorry that I also just noticed the same minor consistency issues with the atms. I should've caught those before. You don't necessarily have to make those changes in this PR. I assume you will have to make some modifications when you do FV3-JEDI vs GSI testing. You can make them there. I just wanted to let you know so you can keep that in mind.

One last thing. I noticed you have a few paths that start with data/obs_ctest. I just wonder if those should be something else? I'm just thinking about when we get to a cycling system, do we want that data stored in that particular directory name? It looks like you had it named differently recently too. I think how you had it before seems reasonable:

       obs bias:
         input file: data/obs/amsua_n19.satbias.nc4
         output file: data/bc/out_amsua_n19.satbias.nc

xyzemc commented 1 month ago

@xyzemc, I just have a few questions:

It seems like you only have MPAS-JEDI results. Have you or do you plan to compare results in FV3-JEDI vs GSI? Both hofx validation where you provide FV3-JEDI with GSI-IODA (this is what I refer to as Phase 1) and making sure all the qc parts work as expected without using GSI-IODA (this is what I refer to as Phase 2). Either way, this is great work and we can definitely add this to the "validated_yamls" even without those tests done yet--since almost all of those yamls are still a work in progress.

Just a minor detail. The file name of this yaml is amsua_n19_qc_bc.yaml. Would it make sense to remove the _qc_bc part and match the name of the obs space in the yaml?

Another minor consistency detail. For conventional obs, we have been using the naming convention for the ioda data as like ioda_msonet.nc--so use something like ioda_amsua_n19.nc. That shouldn't be too hard to change here, but of course you would have to change the name in the converter step too.

Is the hofx file supposed to have rass_ prefixed to it?

@delippi Answers for each question are:

I haven't compared FV3-JEDI with GSI yet for amsua and atms. But will do that in the near future.

I have change the yaml file name as you suggested.

obs name changed to ioda_amsua_n19.nc

I think 'rass_' prefix is what I copied from GDASApp yaml file. But I can definitely remove the prefix.

@xyzemc Thanks for taking those minor comments into consideration! It looks like you forgot to remove the amsua_n19_qc_bc.yaml in this PR. There are two yamls in the "Files Changed" tab.

Sorry that I also just noticed the same minor consistency issues with the atms. I should've caught those before. You don't necessarily have to make those changes in this PR. I assume you will have to make some modifications when you do FV3-JEDI vs GSI testing. You can make them there. I just wanted to let you know so you can keep that in mind.

One last thing. I noticed you have a few paths that start with data/obs_ctest. I just wonder if those should be something else? I'm just thinking about when we get to a cycling system, do we want that data stored in that particular directory name? It looks like you had it named differently recently too. I think how you had it before seems reasonable:
       obs bias:
         input file: data/obs/amsua_n19.satbias.nc4
        ` output file: data/bc/out_amsua_n19.satbias.nc`

@delippi the 'obs_ctest' is copied from atms yaml file that should be setup by @SamuelDegelia-NOAA . But I agree with you input file: data/obs/amsua_n19.satbias.nc4 is more reasonable. For the output file: data/bc/out_amsua_n19.satbias.nc I modified to ' output file: out_amsua_n19.satbias.nc`

removed the data/bc to keep the output in the same place as hofx

xyzemc commented 1 month ago

@delippi For the ATMS yaml file, I have noticed the difference from amsua. I plan to change that later because I am working on atms_n20 yaml file right now. When I open PR to merge atms_n20 yaml file, I will modify the atms_npp yaml as well to keep consistency.

SamuelDegelia-NOAA commented 1 month ago

When creating the new ctest structure, I split up the observation files in RDAS_DATA to obs/ and obs_ctest/ directories in case some of the obs we use for the ctests were different from those we were using before. Normally this was okay because only the gen_yaml_ctest.sh script referenced the obs_ctest/ directory to replace the @OBSFILE@ string in the yamls. But because the ATMS/AMSUA data use multiple input observation files (data, bias, lapse rate), some of those paths got hard coded into the yamls. I agree that this isn't an ideal solution given that want these yamls to be flexible and to be used eventually in the workflow.

I think keeping the yaml file general and just referencing an @OBSFILE@ string is ideal. So maybe we should also similar strings for @BIASFILE@ and @LAPSEFILE@, etc. to the yamls in this PR.

delippi commented 1 month ago

When creating the new ctest structure, I split up the observation files in RDAS_DATA to obs/ and obs_ctest/ directories in case some of the obs we use for the ctests were different from those we were using before. Normally this was okay because only the gen_yaml_ctest.sh script referenced the obs_ctest/ directory to replace the @OBSFILE@ string in the yamls. But because the ATMS/AMSUA data use multiple input observation files (data, bias, lapse rate), some of those paths got hard coded into the yamls. I agree that this isn't an ideal solution given that want these yamls to be flexible and to be used eventually in the workflow.

I think keeping the yaml file general and just referencing an @OBSFILE@ string is ideal. So maybe we should also similar strings for @BIASFILE@ and @LAPSEFILE@, etc. to the yamls in this PR.

Sounds good. I just have one comment. We need to keep those gen_yaml scripts as simple as possible! I think you are, but I just needed to say it! hahaha

SamuelDegelia-NOAA commented 1 month ago

I'm trying! But definitely open to any feedback to simplify it more.

SamuelDegelia-NOAA commented 1 month ago

Sorry to keep changing things, but could we change all of the data/obs/... paths to a generic string that can be replaced? See below. This will help us be more generic in these validated_yaml files.

input file: data/obs/amsua_n19.satbias.nc4 -> input file: "@BIASFILE@" tlapse: &amsua_n19_tlapse data/obs/amsua_n19.tlapse.txt -> tlapse: "@LAPSEFILE@" input file: data/obs/amsua_n19.satbias_cov.nc4 -> input file: "@COVFILE@"

Also, could you add an obs localizations section to the yamls, similar to the file here?

Once those are done then I think this PR will be good to go. We can update the ctests at a later time once we have more yamls ready to go.

xyzemc commented 1 month ago

Sorry to keep changing things, but could we change all of the data/obs/... paths to a generic string that can be replaced? See below. This will help us be more generic in these validated_yaml files.

input file: data/obs/amsua_n19.satbias.nc4 -> input file: "@BIASFILE@" tlapse: &amsua_n19_tlapse data/obs/amsua_n19.tlapse.txt -> tlapse: "@LAPSEFILE@" input file: data/obs/amsua_n19.satbias_cov.nc4 -> input file: "@COVFILE@"

Also, could you add an obs localizations section to the yamls, similar to the file here?

Once those are done then I think this PR will be good to go. We can update the ctests at a later time once we have more yamls ready to go.

@SamuelDegelia-NOAA the variable &amsua_n19_tlapse in tlapse: &amsua_n19_tlapse data/obs/amsua_n19.tlapse.txt will be used in the following lines. So I don't think tlapse: "@LAPSEFILE@" will work.

delippi commented 1 month ago

Sorry to keep changing things, but could we change all of the data/obs/... paths to a generic string that can be replaced? See below. This will help us be more generic in these validated_yaml files. input file: data/obs/amsua_n19.satbias.nc4 -> input file: "@BIASFILE@" tlapse: &amsua_n19_tlapse data/obs/amsua_n19.tlapse.txt -> tlapse: "@LAPSEFILE@" input file: data/obs/amsua_n19.satbias_cov.nc4 -> input file: "@COVFILE@" Also, could you add an obs localizations section to the yamls, similar to the file here? Once those are done then I think this PR will be good to go. We can update the ctests at a later time once we have more yamls ready to go.

@SamuelDegelia-NOAA the variable &amsua_n19_tlapse in tlapse: &amsua_n19_tlapse data/obs/amsua_n19.tlapse.txt will be used in the following lines. So I don't think tlapse: "@LAPSEFILE@" will work.

Wouldn't it just be tlapse: &amsua_n19_tlapse "@LAPSEFILE@" or you could remove the anchor and specify everywhere. Either should work. It just requires running gen_yaml each time. Actually, that is how I do all my tests.

xyzemc commented 1 month ago

Sorry to keep changing things, but could we change all of the data/obs/... paths to a generic string that can be replaced? See below. This will help us be more generic in these validated_yaml files. input file: data/obs/amsua_n19.satbias.nc4 -> input file: "@BIASFILE@" tlapse: &amsua_n19_tlapse data/obs/amsua_n19.tlapse.txt -> tlapse: "@LAPSEFILE@" input file: data/obs/amsua_n19.satbias_cov.nc4 -> input file: "@COVFILE@" Also, could you add an obs localizations section to the yamls, similar to the file here? Once those are done then I think this PR will be good to go. We can update the ctests at a later time once we have more yamls ready to go.

@SamuelDegelia-NOAA the variable &amsua_n19_tlapse in tlapse: &amsua_n19_tlapse data/obs/amsua_n19.tlapse.txt will be used in the following lines. So I don't think tlapse: "@LAPSEFILE@" will work.

Wouldn't it just be tlapse: &amsua_n19_tlapse "@LAPSEFILE@" or you could remove the anchor and specify everywhere. Either should work. It just requires running gen_yaml each time. Actually, that is how I do all my tests.

Ok, let me try tlapse: &amsua_n19_tlapse "@LAPSEFILE@"

guoqing-noaa commented 1 month ago

input file: data/obs/amsua_n19.satbias.nc4 -> input file: "@BIASFILE@" tlapse: &amsua_n19_tlapse data/obs/amsua_n19.tlapse.txt -> tlapse: "@LAPSEFILE@" input file: data/obs/amsua_n19.satbias_cov.nc4 -> input file: "@COVFILE@"

@SamuelDegelia-NOAA Could you elaborate why you would like to have these changes? I think it is because we now have obs and obs_ctest?

If so, let's remove obs, rename obs_ctest to obs and add the single ADPUPA obs to obs. It is also expected to combine that standalone single ob test into testinput (and hence testinput_expr will not be needed in the future).

I don't think we need that extra flexibility to specify biasfile, lapsefile, covfile on the fly. Sorry that I've been busy preparing for tomorrow's travel and did not join the discussion early.

SamuelDegelia-NOAA commented 1 month ago

@guoqing-noaa I would like to keep the yamls added to obtype_config/ as generic as possible. That way we can apply these yamls and associated tools for the workflow at a later date. As such, I think we should limit hardcoding file names in there.

The gen_yaml_ctest.sh can then be the tool we use to take these generic yamls and merge/convert them for use in the ctests. This choice makes it easier IMO since only that script needs to contain case-specific file names and paths.

guoqing-noaa commented 1 month ago

@guoqing-noaa I would like to keep the yamls added to obtype_config/ as generic as possible. That way we can apply these yamls and associated tools for the workflow at a later date. As such, I think we should limit hardcoding file names in there.

The gen_yaml_ctest.sh can then be the tool we use to take these generic yamls and merge/convert them for use in the ctests. This choice makes it easier IMO since only that script needs to contain case-specific file names and paths.

I have an alternate view. For those files, it would be the best to keep the "run_time" substitutions as less as possible.

Regarding the rrfs-workflow, we will use the giant XML file directly (it will be generated from RDASApp offline).

SamuelDegelia-NOAA commented 1 month ago

I have an alternate view. For those files, it would be the best to keep the "run_time" substitutions as less as possible.

Regarding the rrfs-workflow, we will use the giant XML file directly (it will be generated from RDASApp offline).

Even during the workflow, we will need at least two runtime substitutions for the yaml: analysis time and observation distribution (for GETKF observer vs. solver). There might be more that I can't think of.

To me, it makes sense to also make the file names generic and replaceable for the time being until we have settled on standardized file names and input directories. But either solution is fine with me and I don't think it should hold up this PR.

guoqing-noaa commented 1 month ago

Even during the workflow, we will need at least two runtime substitutions for the yaml: analysis time and observation distribution (for GETKF observer vs. solver). There might be more that I can't think of.

Right, we will need to substitute the analysis time, the obs distribution and maybe a limited sets of others as needed in the future. But we would like to keep the substitutions as minimum as possible to keep the workflow as simple as possible.

To me, it makes sense to also make the file names generic and replaceable for the time being until we have settled on standardized file names and input directories. But either solution is fine with me and I don't think it should hold up this PR.

I did not hold up this PR. But I do think it is better to get things sorted out as early as possible.

SamuelDegelia-NOAA commented 1 month ago

I did not hold up this PR. But I do think it is better to get things sorted out as early as possible.

I am fine with changing those lines back if that's what you want.

guoqing-noaa commented 1 month ago

I did not hold up this PR. But I do think it is better to get things sorted out as early as possible.

I am fine with changing those lines back if that's what you want.

That will be helpful. Thanks, @SamuelDegelia-NOAA

xyzemc commented 1 month ago

I have replaced the three input pathes and names from 'string' to the actual 'data/obs/filename'. Please re-check and approve it. Thanks!

HaidaoLin-NOAA commented 4 weeks ago

This is a general comment to the group and we can address/discuss this later, don't have to be in this PR. Should we put all radiance bias correction related files, under data/obs/ or put them on a separate path/folder, such as data/satbias/. This is not very important and either way would be fine, but I would prefer to have a separate folder for satbias related files and not put them all under data/obs. Any thoughts/comments? Thanks!

NOAA-EMC / RDASApp

add the amsua_n19_qc_bc.yaml into templates #206