NOAA-EMC / EVS

12 stars 24 forks source link

Feature/refs initial #528

Closed BinbinZhou-NOAA closed 1 month ago

BinbinZhou-NOAA commented 1 month ago

Note to developers: You must use this PR template!

Description of Changes

REFS verification is a new component for EVS/v2.
(1) This is the initial version of REFS component is for EVS (v2), but its contents are very similar to those of HREF component (2) The MET/METplus versions v12/v6 are used (3) The restart capabilities are added to both stats and plots jobs (4) The REFS is still in development stage, and it is routinely run by Matt Pyle.

Please include a summary of the changes and the related GitHub issue(s). Please also include relevant motivation and context. N/A

Developer Questions and Checklist

Testing Instructions

Since the REFS is still run by Matt Pyle, in the testing, the COMINrefs in all of the stat driver scripts should be set to the REFS output directory: export COMINrefs=/lfs/h2/emc/ptmp/emc.lam/para/com/refs/v1.0

Note: If the testing is on a personal account, the following line should be added to the spcoutlook stat driver script: export EVSINspcotlk=/lfs/h2/emc/vpppg/noscrub/emc.vpppg/evs/v2.0/prep/cam or export EVSINspcotlk=/lfs/h1/ops/prod/com/evs/v1.0/prep/cam

Test procedures:

Part 1. For the stats generation jobs

There are 3 stats jobs: jevs_cam_refs_grid2obs_stats.sh jevs_cam_refs_precip_stats.sh jevs_cam_refs_spcoutlook_stats.sh

For each job, following scenarios should be tested:

For jevs_cam_refs_grid2obs_stats.sh: Scenario 1: first run, no any interruption In this case, all 3 output stat files (final stat files) should be generated and copied to the final stat directory $COMOUT/cam/refs.$VDATE In the mean while the restart directory "restart" should be created in the small stat directory $COMOUT/cam/atmos.$VDATE/refs/grid2obs, in which all small stat files are saved (same as old version) for gather processing or for restart run In the restart directory, there are 4 sub-directories: prepare, product, profile, and system

In the prepare sub-directory, there 2 *completed files: gfs_prepbufr.completed and rap_prepbufr.completed, and sub-direcotory prepbufr.20240714 in which prepared prebbufr netCDF files are stored and for restart

In the product, or profile or system sub-directory, there are several *.completed files to indicate which sub-tasks are completed If the grid2obs job is fully completed without interruption, the completed files for all of the sub-tasks should be presented in these 3 sub-directories, respectively.

Scenario 2: the prepare process and a only part of the stats generation are completed but other part of stats generation is not. Suppose all processes in profile and system are completed but the processes in product are not. To simulate this scenario: Step 1: delete all of the output from the scenario 1, including the final stat files in $COMOUT/cam/refs.$VDATE, the small stat files and restart sub-directory in the $COMOUT/cam/atmos.$VDATE/refs/grid2obs

Step 2: submit the driver script jevs_cam_refs_grid2obs_stats.sh

Step 3: waiting about 1 hour, and kill the grid2obs job

Step 4: re-submit the driver script jevs_cam_refs_grid2obs_stats.sh

Step 5: After it is completed, check the output final stat files, they should be same as the same files from scenario 1:

Repeat the above procedures for precip and spcoutlook jobs

Part 2 For the plot generation jobs

Note: since REFS is a new component for EVS, there are no REFS stat files in vpppg diectory /lfs/h2/emc/vpppg/noscrub/emc.vpppg/evs/v2.0/stats/cam So please set COMIN in the plot driver scripts to be /lfs/h2/emc/vpppg/noscrub/binbin.zhou/evs/v2.0/stats/cam

There are 15 jobs: 7 jobs for 31-day score plots, 7 jobs for 90-day score plots and 1 job for precip spatial map:

All of those 15 jobs are pretty faster, most of them are less than 15i min, but still have restart capability except for spatial map job.

For each job, a restart sub-directory is created in the $COMOUT directory: $COMOUT/atmos.$VDATE/restart, in which there are additional 2 sub-directories 31 and 90 (for 31-day and 90-day plots,respectively) in either 31 or 90 sub-directory, there are 7 sub-directories for each of 31-day jobs: refs_cape_plots refs_ctc_plots refs_ecnt_plots refs_precip_plots refs_profile_plots refs_snowfall_plots refs_spcoutlook_plots

In each of these directory, all of completed png files and their indicating files .completed are stored. In the restart run, each of the completed will be checked, if it exists, its corresponding png file is copied to the working directory, otherwise, it will be generated by the job in the working directory After all of png, eitehr are generated from the job or copied from the restart directory, they are combined into a big tar file and staved in the COMOUT/atmos.$DVATE directory

The testing procedures for those 15 jobs depend on their walltime

(1) Following jobs are less than 1 min: jevs_cam_refs_grid2obs_cape_past31days_plots.sh jevs_cam_refs_grid2obs_cape_past90days_plots.sh jevs_cam_refs_grid2obs_ctc_past31days_plots.sh jevs_cam_refs_grid2obs_ctc_past90days_plots.sh jevs_cam_refs_precip_past31days_plots.sh jevs_cam_refs_precip_past90days_plots.sh

The restart testing can be skipped, and just the normal runs are tested Note, if testing the restart capability. The procedure is: Step 1: launch the driver script, ushc as jevs_cam_refs_grid2obs_cape_past31days_plots.sh, Step 2: wait about 30 or 40 seconds. kill the job and re-run the job Step 3: Check the final tar file to see if it is ok or not

(2) Other jobs has walltime between 2 ~ 15 min: jevs_cam_refs_profile_past31days_plots.sh jevs_cam_refs_profile_past90days_plots.sh jevs_cam_refs_grid2obs_ecnt_past31days_plots.sh jevs_cam_refs_grid2obs_ecnt_past90days_plots.sh jevs_cam_refs_snowfall_past31days_plots.sh jevs_cam_refs_snowfall_past90days_plots.sh jevs_cam_refs_spcoutlook_past31days_plots.sh jevs_cam_refs_spcoutlook_past90days_plots.sh

The testing procedure is similar to the above but kill the job after waiting 1 or 2 minutes

(3) jevs_cam_refs_precip_spatial_plots.sh No restart capability in this job since it is less than 1 min

malloryprow commented 1 month ago

@BinbinZhou-NOAA I started reviewing the code changes for the dev/, ecf/, and jobs/ and left comments for things to be changed. Please make the changes and I'll continue the review.

BinbinZhou-NOAA commented 1 month ago

Mallory,

Thanks for these comments, I'll update them accordingly.

Binbin

On Tue, Aug 27, 2024 at 9:23 AM Mallory Row @.***> wrote:

@BinbinZhou-NOAA https://github.com/BinbinZhou-NOAA I started reviewing the code changes for the dev/, ecf/, and jobs/ and left comments for things to be changed. Please make the changes and I'll continue the review.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EVS/pull/528#issuecomment-2312551824, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQBMPFOLZLEHQGYDJAKFBWDZTR4TDAVCNFSM6AAAAABNEQJNYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJSGU2TCOBSGQ . You are receiving this because you were mentioned.Message ID: @.***>

--

Binbin Zhou

Physical Scientist

Lynker at NOAA/NWS/NCEP/EMC

5830 University Research Ct.

College Park, MD 20740

@.***

301-683-3683

malloryprow commented 1 month ago

Thanks! Can you also remove all instances of LOOP_ORDER in the METplus config files?

BinbinZhou-NOAA commented 1 month ago

Mallory,

All of the LOOP_ORDER instances in the METplus conf files have been removed. The driver and ecf scripts, and JOB files have been updated on the feature/refs_initial branch.

Binbin

On Tue, Aug 27, 2024 at 11:08 AM Mallory Row @.***> wrote:

Thanks! Can you also remove all instances of LOOP_ORDER in the METplus config files?

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EVS/pull/528#issuecomment-2312830245, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQBMPFN4ROLVRMW5IFIGKQ3ZTSI73AVCNFSM6AAAAABNEQJNYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJSHAZTAMRUGU . You are receiving this because you were mentioned.Message ID: @.***>

--

Binbin Zhou

Physical Scientist

Lynker at NOAA/NWS/NCEP/EMC

5830 University Research Ct.

College Park, MD 20740

@.***

301-683-3683

malloryprow commented 1 month ago

Something has gone awry with this PR again. There are changes to NFCENS and RTOFS related files.

BinbinZhou-NOAA commented 1 month ago

Mallory,

I didn't touch any of NFCENS abd RTOFS related files. Which files have gone away?

Binbin

On Tue, Aug 27, 2024 at 12:13 PM Mallory Row @.***> wrote:

Something has gone awry with this PR again. There are changes to NFCENS and RTOFS related files.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EVS/pull/528#issuecomment-2312989623, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQBMPFPJTXLI3CXOGEF4623ZTSQR7AVCNFSM6AAAAABNEQJNYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJSHE4DSNRSGM . You are receiving this because you were mentioned.Message ID: @.***>

--

Binbin Zhou

Physical Scientist

Lynker at NOAA/NWS/NCEP/EMC

5830 University Research Ct.

College Park, MD 20740

@.***

301-683-3683

malloryprow commented 1 month ago

You merged develop into feature/refs_initial. You should not be doing that. You should be merging feature/rrfs_refs_v1 into your branch feature/refs_intial. https://github.com/NOAA-EMC/EVS/commit/16394a0e13f440021f30abd4a1dfccb82fb947f5

BinbinZhou-NOAA commented 1 month ago

Mallory,

Then how to reverse it on the remote branch? I still do not "git pull" on local yet

Thanks!

Binbin

On Tue, Aug 27, 2024 at 12:21 PM Mallory Row @.***> wrote:

You merged develop into feature/refs_initial. You should not be doing that. You should be merging feature/rrfs_refs_v1 into your branch feature/refs_intial. 16394a0 https://github.com/NOAA-EMC/EVS/commit/16394a0e13f440021f30abd4a1dfccb82fb947f5

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EVS/pull/528#issuecomment-2313006753, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQBMPFOFV2T2POOZLWQDC3LZTSRQJAVCNFSM6AAAAABNEQJNYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJTGAYDMNZVGM . You are receiving this because you were mentioned.Message ID: @.***>

--

Binbin Zhou

Physical Scientist

Lynker at NOAA/NWS/NCEP/EMC

5830 University Research Ct.

College Park, MD 20740

@.***

301-683-3683

malloryprow commented 1 month ago

Please try this while in the location of the feature branch feature/refs_inital on WCOSS2

  1. git fetch
  2. git pull origin feature/refs_initial
  3. git reset --hard HEAD~1
  4. git push origin feature/refs_initial
BinbinZhou-NOAA commented 1 month ago

Mallory,

I just did these 4 steps, but not sure if the wrong commit is reverted.

Binbin

On Tue, Aug 27, 2024 at 12:48 PM Mallory Row @.***> wrote:

Please try this while in the location of the feature branch feature/refs_inital on WCOSS2

  1. git fetch
  2. git pull origin feature/refs_initial
  3. git reset --hard HEAD~1
  4. git push origin feature/refs_initial

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EVS/pull/528#issuecomment-2313060140, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQBMPFLRVCZNPRIEBQQAJN3ZTSUURAVCNFSM6AAAAABNEQJNYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJTGA3DAMJUGA . You are receiving this because you were mentioned.Message ID: @.***>

--

Binbin Zhou

Physical Scientist

Lynker at NOAA/NWS/NCEP/EMC

5830 University Research Ct.

College Park, MD 20740

@.***

301-683-3683

malloryprow commented 1 month ago

I'm still seeing the NFCENS and RTOFS files from your merge of develop in this PR. Your merge of develop into feature/refs_initial is still the last commit of the branch.

malloryprow commented 1 month ago

Please try this

  1. git fetch
  2. git pull origin feature/refs_initial
  3. git revert -m 1 16394a0e13f440021f30abd4a1dfccb82fb947f5
  4. git push origin feature/refs_initial
BinbinZhou-NOAA commented 1 month ago

Mallory,

I see this notice (in yellow bar on top) for the commit 16394a0 https://github.com/NOAA-EMC/EVS/commit/16394a0e13f440021f30abd4a1dfccb82fb947f5

"This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository", and I used "git log" to search this commit ID, but could not find this commit. So there is no way to convert it back from the feature/refs_initial branch locally. So if the rollback of this accident commit is too complex, the easiest way is close this PR and create a new one for REFS.

Binbin

On Tue, Aug 27, 2024 at 1:14 PM Mallory Row @.***> wrote:

I'm still seeing the NFCENS and RTOFS files from your merge of develop in this PR. Your merge of develop into feature/refs_initial is still the last commit of the branch.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EVS/pull/528#issuecomment-2313111196, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQBMPFP6KRMU57DAZDYMJ53ZTSXWVAVCNFSM6AAAAABNEQJNYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJTGEYTCMJZGY . You are receiving this because you were mentioned.Message ID: @.***>

--

Binbin Zhou

Physical Scientist

Lynker at NOAA/NWS/NCEP/EMC

5830 University Research Ct.

College Park, MD 20740

@.***

301-683-3683

malloryprow commented 1 month ago

Make sure the new branch for the PR is off of feature/rrfs_refs_v1. In the future, do not merge changes from develop into your refs branch.

BinbinZhou-NOAA commented 1 month ago

Mallory,

Sure, new experience learned! Thanks!

Binbin

On Tue, Aug 27, 2024 at 2:08 PM Mallory Row @.***> wrote:

Make sure the new branch for the PR is off of feature/rrfs_refs_v1. In the future, do not merge changes from develop into your refs branch.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EVS/pull/528#issuecomment-2313207190, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQBMPFPGWCJY3ZYOEKVM65TZTS6CRAVCNFSM6AAAAABNEQJNYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJTGIYDOMJZGA . You are receiving this because you were mentioned.Message ID: @.***>

--

Binbin Zhou

Physical Scientist

Lynker at NOAA/NWS/NCEP/EMC

5830 University Research Ct.

College Park, MD 20740

@.***

301-683-3683

BinbinZhou-NOAA commented 1 month ago

Mallory,

LOG_LEVEL = DEBUG and LOG_ENSEMBLE_STAT_VERBOSITY = 2 should be removed from all of METplus conf files?

Thanks!

Binbin

On Tue, Aug 27, 2024 at 12:18 PM Mallory Row @.***> wrote:

@.**** commented on this pull request.

On parm/metplus_config/stats/cam/grid2obs/EnsembleStat_fcstREFS_obsPREPBUFR_PROFILE.conf https://github.com/NOAA-EMC/EVS/pull/528#discussion_r1733184754:

Remove LOG_LEVEL = DEBUG and LOG_ENSEMBLE_STAT_VERBOSITY = 2. Logging information is controlled via machine.conf.

— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EVS/pull/528#pullrequestreview-2263915932, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQBMPFLGR6DPO64DTPTGLB3ZTSRE7AVCNFSM6AAAAABNEQJNYSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDENRTHEYTKOJTGI . You are receiving this because you were mentioned.Message ID: @.***>

--

Binbin Zhou

Physical Scientist

Lynker at NOAA/NWS/NCEP/EMC

5830 University Research Ct.

College Park, MD 20740

@.***

301-683-3683

malloryprow commented 1 month ago

Yes, all logging is controlled by machine.conf and logging settings should not be used in the the component METplus conf files.