Closed RussTreadon-NOAA closed 2 days ago
FYI @guillaumevernieres & @AndrewEichmann-NOAA
@guillaumevernieres FYI
Work is in progress to copy obs and fix files to glopara space that should hopefully move this issue towards resolution.
@guillaumevernieres Can you confirm that /scratch1/NCEPDEV/global/glopara/fix/gdas/soca/20240802
contains what you have in /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static
. I did a diff and see a few things different but am not sure if they need to be in the glopara fix copy or not:
[Kate.Friedman@hfe01 glopara]$ diff -r /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/ fix/gdas/soca/20240802/
Only in /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/1440x1080x75/soca: MOM_input~
Only in /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/1440x1080x75/soca: bkgerr
Only in /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/72x35x25/soca: bkgerr
Only in fix/gdas/soca/20240802/common: #fields_metadata.yaml#
Thanks!
@guillaumevernieres Can you confirm that
/scratch1/NCEPDEV/global/glopara/fix/gdas/soca/20240802
contains what you have in/scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static
. I did a diff and see a few things different but am not sure if they need to be in the glopara fix copy or not:[Kate.Friedman@hfe01 glopara]$ diff -r /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/ fix/gdas/soca/20240802/ Only in /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/1440x1080x75/soca: MOM_input~ Only in /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/1440x1080x75/soca: bkgerr Only in /scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/72x35x25/soca: bkgerr Only in fix/gdas/soca/20240802/common: #fields_metadata.yaml#
Thanks! Thanks for checking @KateFriedman-NOAA
fix/gdas/soca/20240802/common: #fields_metadata.yaml#
can be removed and the rest of the diff ignored.
However, it looks like the links to the common files were not preserved in your copy.
can be removed and the rest of the diff ignored
@guillaumevernieres Thanks for the feedback! I have removed the #fields_metadata.yaml#
file from soca/20240802/common
.
However, it looks like the links to the common files were not preserved in your copy.
The original gdas/soca
fix copy (20240624
) that was provided to us only had symlinks for the rossrad.nc
file so that's what we have:
[role.glopara@hfe06 20240802]$ pwd
/scratch1/NCEPDEV/global/glopara/fix/gdas/soca/20240802
[role.glopara@hfe06 20240802]$ ll */soca | grep rossrad.nc
lrwxrwxrwx 1 role.glopara global 23 Jun 27 14:15 rossrad.nc -> ../../common/rossrad.nc
lrwxrwxrwx 1 role.glopara global 23 Jun 27 14:15 rossrad.nc -> ../../common/rossrad.nc
lrwxrwxrwx 1 role.glopara global 23 Jun 27 14:15 rossrad.nc -> ../../common/rossrad.nc
lrwxrwxrwx 1 role.glopara global 23 Jun 27 14:16 rossrad.nc -> ../../common/rossrad.nc
In comparing the files to check if they are identical and can be made back into symlinks (like in your set) I find this difference:
[role.glopara@hfe06 20240802]$ /apps/nccmp/1.9.1/gcc-13.2.0/bin/nccmp -dgB 1440x1080x75/soca/RECCAP2_region_masks_all_v20221025.nc common/RECCAP2_region_masks_all_v20221025.nc
DIFFER : VARIABLE : lon : ATTRIBUTE : _FillValue : VALUES : nan <> nan
I can either leave the 20240802
as is or create a new timestamp to set up those symlinks (I don't want to touch the used files in the 20240802
set since it is currently "live" in global-workflow develop
). Let me know your thoughts, thanks!
@guillaumevernieres Gentle poke about my comment/question above. If we end up needing another soca timestamp I'd like to include it in the PR that I'm ready to open to resolve this issue. Let me know, thanks!
@guillaumevernieres Gentle poke about my comment/question above. If we end up needing another soca timestamp I'd like to include it in the PR that I'm ready to open to resolve this issue. Let me know, thanks!
Sorry for the late reply @KateFriedman-NOAA . A new timestamp with the symlinks sounds good to me. I'll check what version of RECCAP2_region_masks_all_v20221025.nc we should use.
A new timestamp with the symlinks sounds good to me. I'll check what version of RECCAP2_region_masks_all_v20221025.nc we should use.
@guillaumevernieres Okie dokie, I'll make a new timestamp once the file sources is confirmed by you. Please also check the other files, I only compared a few but the ones I compared all reported a similar small difference. Thanks!
@KateFriedman-NOAA , let's use 1440x1080x75/soca/RECCAP2_region_masks_all_v20221025.nc
as the common file.
@RussTreadon-NOAA I am trying to test the C48mx500_3DVarAOWCDA CI test on WCOSS2-Cactus to confirm my updates to resolve the hardcoded paths is good. The gdasmarinebmat
job is failing, not sure if this is expected/known. The gdasprepoceanobs
job ran and succeeded. The gdasocnanalprep
has not yet run. Would you mind taking a look?
HOMEgfs: /lfs/h2/emc/global/noscrub/kate.friedman/git/feature-experimental_obs_path
EXPDIR: /lfs/h2/emc/ptmp/kate.friedman/comrot/RUNTESTS/EXPDIR/testcyc_C48_S2S
COMROT: /lfs/h2/emc/ptmp/kate.friedman/comrot/RUNTESTS/COMROOT/testcyc_C48_S2S
log: /lfs/h2/emc/ptmp/kate.friedman/comrot/RUNTESTS/COMROOT/testcyc_C48_S2S/logs/2021032418/gdasmarinebmat.log.0
Note, the first attempt at the gdasmarinebmat
job hit the walltime. I increased the walltime and let it try again. It's currently hung and will hit the new walltime.
I can at least confirm that DMPDIR
is being set correctly via the defaults.yaml now but haven't been able to run the job that uses it yet:
kate.friedman@clogin03:/lfs/h2/emc/ptmp/kate.friedman/comrot/RUNTESTS/COMROOT/testcyc_C48_S2S> grep DMPDIR= logs/2021032418/gdasprepoceanobs.log
+++ config.base[49]: export DMPDIR=/lfs/h2/emc/dump/noscrub/dump
+++ config.base[49]: DMPDIR=/lfs/h2/emc/dump/noscrub/dump
+++ config.prepoceanobs[17]: export DMPDIR=/lfs/h2/emc/global/noscrub/emc.global/data/experimental_obs
+++ config.prepoceanobs[17]: DMPDIR=/lfs/h2/emc/global/noscrub/emc.global/data/experimental_obs
@KateFriedman-NOAA: I am no longer able to run C48mx500_3DVarAOWCDA from g-w PR #2875 after updating the gdas.cd hash. This is a known issue. g-w PR #2920 needs to be merged into g-w develop
. After this I can bring the updated g-w develop
into g-w PR #2785 and C48mx500_3DVarAOWCDA should work again.
I don't know if this impacts your test. Tagging @guillaumevernieres
Correct @RussTreadon-NOAA , the WCDA test won't work with the new gdas.cd # .
Okie dokie, thanks @RussTreadon-NOAA @guillaumevernieres for confirming the failure is expected. Let me know if it would help to merge the changes I've prepped for this issue into that open PR. My changes are here: https://github.com/NOAA-EMC/global-workflow/compare/develop...KateFriedman-NOAA:global-workflow:feature/experimental_obs_path . Otherwise I'll wait to retest and submit this via PR after that other PR goes in.
I think the WCDA test works with g-w PR #2920. If true, we should merge #2920 first. After this we can update other g-w PRs.
let's use 1440x1080x75/soca/RECCAP2_region_masks_all_v20221025.nc as the common file.
@guillaumevernieres Done:
[role.glopara@hfe02 20240919]$ pwd
/scratch1/NCEPDEV/global/glopara/fix/gdas/soca/20240919
[role.glopara@hfe02 20240919]$ rsync -azv 1440x1080x75/soca/RECCAP2_region_masks_all_v20221025.nc common/RECCAP2_region_masks_all_v20221025.nc
sending incremental file list
RECCAP2_region_masks_all_v20221025.nc
sent 22,039 bytes received 35 bytes 44,148.00 bytes/sec
total size is 58,414 speedup is 2.65
...and then I removed the other copies and made symlinks to the common
one:
[role.glopara@hfe02 20240919]$ ll common/RECCAP2_region_masks_all_v20221025.nc
-rw-r--r-- 1 role.glopara global 58414 May 14 19:37 common/RECCAP2_region_masks_all_v20221025.nc
[role.glopara@hfe02 20240919]$ ll */soca/RECCAP2*
lrwxrwxrwx 1 role.glopara global 50 Sep 20 13:08 1440x1080x75/soca/RECCAP2_region_masks_all_v20221025.nc -> ../../common/RECCAP2_region_masks_all_v20221025.nc
lrwxrwxrwx 1 role.glopara global 50 Sep 20 13:08 360x320x75/soca/RECCAP2_region_masks_all_v20221025.nc -> ../../common/RECCAP2_region_masks_all_v20221025.nc
lrwxrwxrwx 1 role.glopara global 50 Sep 20 13:08 4500x3297x75/soca/RECCAP2_region_masks_all_v20221025.nc -> ../../common/RECCAP2_region_masks_all_v20221025.nc
lrwxrwxrwx 1 role.glopara global 50 Sep 20 13:09 72x35x25/soca/RECCAP2_region_masks_all_v20221025.nc -> ../../common/RECCAP2_region_masks_all_v20221025.nc
Are there any other files to adjust in the new gdas/soca/20240919
set?
Merging work prepped to resolve this issue into PR #2920.
What is wrong?
The
ocnanal
andprepoceanobs
sections ofparm/config/gfs/yaml/defaults.yaml
contain Hera specific pathsThe above caused gdasocnanalprep from C48mx500_3DVarAOWCDA CI to fail on Cactus
The hardwired paths will also impact this job on other non-Hera platforms (e.g, Orion, Hercules, Jet, ...).
What should have happened?
gdasocnanalprep should run to completion on wcos2
What machines are impacted?
WCOSS2, Orion, Hercules, Jet, Cloud
Steps to reproduce
gdasprepoceanobs will fail with a
wxflow
not found error. Fix this by addingwxflow
toPYTHONPATH
. The same change needs to be added to gdasocnanalprep. After definingwxflow
in gdasocnanalprep, the job will run up to the errorAdditional information
Comments in
parm/config/gfs/yaml/defaults.yaml
note that we need to copy files in/scratch2/NCEPDEV/ocean/Guillaume.Vernieres/data/static/72x35x25/soca
to g-w space on supported platforms.More challenging is deciding how best to replicate
DMPDIR=/scratch1/NCEPDEV/global/glopara/data/experimental_obs
on supported platforms.Do you have a proposed solution?
No response