NOAA-EMC / global-workflow

Global Superstructure/Workflow supporting the Global Forecast System (GFS)
https://global-workflow.readthedocs.io/en/latest
GNU Lesser General Public License v3.0
75 stars 168 forks source link

Port global-workflow to Jet #357

Closed KateFriedman-NOAA closed 1 year ago

KateFriedman-NOAA commented 3 years ago

This issue will document efforts to port global-workflow to Jet. @DavidHuber-NOAA will be working on porting to Jet.

Previously work done by @lgannoaa to port free-forecast mode was successful. @lgannoaa please send your Jet port changes back to the respective repos, including global-workflow. @DavidHuber-NOAA will use that to start with. Thanks!

Will need to establish glopara installs like other supported machines (@KateFriedman-NOAA can assist with):

DavidHuber-NOAA commented 3 years ago

@lgannoaa The gldas build modules all look correct to me. I did a test build with them and everything built fine.

By the way, I have not been able to start this work on Jet just yet. I'm finishing up the S4 port, but I should be able to move to this in 2-3 weeks.

lgannoaa commented 3 years ago

@DavidHuber-NOAA Thank you for verify.

DavidHuber-NOAA commented 3 years ago

@lgannoaa I may have spoken too soon. On S4 (#138), I was having issues with the gdasgldas job crashing. The fix there was to update the esmf library to 8_1_1 in both gdas2gldas.s4 and gldas2gdas.s4. The reason being that module_base.s4 is sourced, which loads esmf/8_1_1. However, I notice that this mismatch is present in the gdas2gldas and gldas2gdas build scripts for all machines. @KateFriedman-NOAA Should these be updated for all machines?

KateFriedman-NOAA commented 3 years ago

@DavidHuber-NOAA What tag of GLDAS are you using? The current tag of GLDAS in the develop branch (gldas_gfsv16_release.v1.15.0) uses esmf/8_1_0_beta_snapshot_27 from hpc-stack. For Jet...the GLDAS will just need new modulesfiles that point to the same libraries through the hpc-stack installation there. In theory the new modulefile for gdas2gldas (gdas2gldas.jet) would be identical to the others but with the Jet hpc-stack installation path:

module use /lfs4/HFIP/hfv3gfs/nwprod/hpc-stack/libs/modulefiles/stack

(Note: the official hpc-stack installation locations are listed here: https://github.com/NOAA-EMC/hpc-stack/wiki/Official-Installations)

You should be able to just copy the hera one, change the module use line, and it will build (this work would happen in a branch in the GLDAS repo). The Jet part of sorc/machine-setup.sh and the module_base.jet in global-workflow will need to be updated/created too. Here is the gdas2gldas.hera modulefile from that tag as an example:

https://github.com/NOAA-EMC/GLDAS/blob/850af240db234b1bc8894e85650164d15ca344af/modulefiles/gdas2gldas.hera

Let me know if I didn't answer your question.

DavidHuber-NOAA commented 3 years ago

@KateFriedman-NOAA I am using the gldas_gfsv16_release.v1.15.0 tag. You're correct that the module files in gldas all reference esmf/8_1_0_beta_snapshot_27. What I was asking is whether this should be incremented to esmf/8_1_1 since the workflow job gdasgldas (i.e. gldas.sh) calls load_fv3gfs_modules.sh which loads module_base.machine that loads the esmf/8_1_1 module.

On S4, this setup was causing the gdas2gldas executable to crash in ESMF. Incrementing the build scripts to esmf/8_1_1 fixed the crash.

KateFriedman-NOAA commented 3 years ago

@DavidHuber-NOAA Oh, I see what you're saying now. Hmmmm, all of the components in develop have a mix of esmf/8_1_0_beta_snapshot_27 and esmf/8_1_1. I can't make the decisions for the GLDAS folks but if esmf/8_1_1 works on S4 then it should likely work elsewhere. Ok yeah, when you make S4/Jet changes in a GLDAS branch change the esmf module to esmf/8_1_1 and then it can be discussed in the PR review.

We will be doing an overhaul of modules in workflow in the near future so we may push all components to esmf/8_1_1, I can't say for sure yet though since it'll be up to the component managers. The idea will be to move away from workflow-level modulefiles and rely on the component settings so this may become a moot point in the not too-distance future.

DavidHuber-NOAA commented 3 years ago

@KateFriedman-NOAA Alright, sounds like a plan. Thanks!

GeorgeGayno-NOAA commented 3 years ago

@KateFriedman-NOAA I opened an issue for porting the gdas_init scripts to Jet. I have done no work on it yet. Should I prioritize it now?

https://github.com/NOAA-EMC/UFS_UTILS/issues/544

KateFriedman-NOAA commented 3 years ago

@GeorgeGayno-NOAA Yes, please now prioritize that task and keep @DavidHuber-NOAA up-to-date so he can test it when ready. UFS_UTILS is already working well enough to run the init job via global-workflow on Jet so we'll just need the gdas_init offline feature. Thanks!

KateFriedman-NOAA commented 3 years ago

@DavidHuber-NOAA I have installed the new obsproc packages on Jet that are going into operations next week (Issue #341). The updates for the new versions (HOMEobsproc* variables in config.base) will be going into the develop branch shortly after the implementation. I installed them in the existing glopara account space so, as long as BASE_GIT is correct for Jet, you shouldn't have to do anything for obsproc on Jet (except test the prep jobs). For Jet: BASE_GIT=/lfs4/HFIP/hfv3gfs/glopara/git

FYI, here are the installs: /lfs4/HFIP/hfv3gfs/glopara/git/obsproc/obsproc_global.v3.4.2_hpc-stack /lfs4/HFIP/hfv3gfs/glopara/git/obsproc/obsproc_prep.v5.5.0_hpc-stack

Let me know if you have any questions or issues whenever you get to testing the prep jobs in cycled mode via global-workflow on Jet. Thanks!

DavidHuber-NOAA commented 3 years ago

Will do, thanks @KateFriedman-NOAA!

DavidHuber-NOAA commented 3 years ago

@KateFriedman-NOAA Is there a rstprod directory for prepbufr data on Jet (i.e. GESROOT)?

KateFriedman-NOAA commented 3 years ago

@KateFriedman-NOAA Is there a rstprod directory for prepbufr data on Jet (i.e. GESROOT)?

@DavidHuber-NOAA For GESROOT on Jet please set it to null like we're doing on Orion: export GESROOT=/dev/null

The GESROOT variable is specific mainly to WCOSS (it comes from the prod_envir module) but is also supported on Hera because NCO has a mirror of some production data from /com being pushed there. Now that we're expanding out to more machines we should try to make this section of config.prepbufr more generic.

Let's try something like this:

export GESROOT=${GESROOT:"/dev/null"}
if [ $machine = "HERA" ]; then
    export GESROOT="/scratch1/NCEPDEV/rstprod"
fi

When run on WCOSS after the prod_envir module is loaded the first line should get the path from the module. On Hera it should get that rstprod path and on other platforms it should be set to that null path since /com or a /com mirror isn't available elsewhere currently.

When your Jet work goes into PR I can run your branch on WCOSS to double check this adjustment doesn't mess anything up there. We'll want a similar test on Hera for the same reason. Thanks!

DavidHuber-NOAA commented 3 years ago

I have a working test build on Jet that has successfully run through 5 GDAS and 2 GFS cycles successfully running at C192/C96/L127 with 20 ensemble members. The only errors I've received are related to the TC_Tracker software and some HPSS errors related to Jet's network problems yesterday and today. The jobs should run over the weekend, at which point I can see how the gdasgldas job ran.

DavidHuber-NOAA commented 3 years ago

@KateFriedman-NOAA So far, I have only been running on xjet, but I have written up the resources for the other partitions. Currently, I'm disabling C768 runs on ujet and vjet as the gfsfcst job is too large to run on ujet and would take up all of vjet. The analysis jobs are also problematic for these partitions. I could adjust threading and/or layouts in config.fv3 to allow C768 forecasts, but that would also probably require that the forecast hours be reduced (on S4, we can only run out to 168 hours @C384 with 1 thread).

Do you think I should keep C768 disabled on ujet and vjet or try to play around with the threading and/or layouts?

KateFriedman-NOAA commented 3 years ago

@DavidHuber-NOAA Thanks for this update! So since the login nodes build for xjet and kjet by default (what most folks will do) I'd like to make those two partitions our main focus for this task. As you've noted, the other partitions are too small for our largest jobs at the higher resolutions; they are all smaller than Hera, which is our smallest supported HPC right now with 24 cores/node.

Having said that though, I'd like to document the resource settings you've compiled and we can have a separate issue to extend support to other partitions after committing the cycled Jet support into develop. Question...how different are the resources for the other partitions? Is it more than just changing the max node value and letting the inner calculations happen...or are there additional adjustments? I'm trying to gauge how complicated the configs/resource settings will be to support the smaller partitions.

Thanks!

DavidHuber-NOAA commented 3 years ago

@KateFriedman-NOAA OK, that's easy enough. I will adjust the config scripts for just those two, though I will note that @lgannoaa added support for vjet and sjet for the free-forecast. I will leave that alone and add support for the cycled to just the xjet and kjet partitions.

For all of the partitions except kjet, some adjustments to resources are necessary for the gdaseobs and (gdas|gfs)anal jobs -- more importantly eobs. At least one script requires the number of available cores to match the number of requested cores when creating/linking directories: exglobal_atmos_analysis.sh. For instance, if gdaseobs is called at C384 on xjet (default npe=100, nth=2), then it will be run on 9 nodes, or 216 cores. This will result in an instability as some of the data generated by gdaseobs will not be copied back into the ROTDIR. So the number of requested cores must match npe. For the case above, npe=108. @CoryMartin-NOAA may be able to elaborate if necessary.

Besides the resources, it may also be necessary to modify some of the build scripts to have the correct instructions for each architecture. The Jet documentation lists these as -axSSE4.2,AVX,CORE-AVX2,CORE-AVX512 -align array64byte -ip.

KateFriedman-NOAA commented 3 years ago

added support for vjet and sjet for the free-forecast. I will leave that alone and add support for the cycled to just the xjet and kjet partitions.

@DavidHuber-NOAA Great, thanks! Thanks also for the info on resource adjustments for some jobs on the other partitions and for the build instruction adjustments! I'll discuss this with my fellow workflow code managers and see how we'd like to proceed with supporting the other partitions beyond xjet and kjet.

DavidHuber-NOAA commented 3 years ago

@KateFriedman-NOAA Another job that will need modified resources for different partitions is gdaseupd. So far, xjet is able to run the job without modification, though I've only tested it at C192/C96. For kjet, I had to increase threads to 4 to run the C384/C192 as sacct showed a MAXRSS of ~8GB with 20 ensemble members. Do you know the approximate memory footprint of this job at various resolutions? If so, I can calculate what the threads will need to be for each partition. Otherwise, I will just test and adjust as necessary. Thanks!

KateFriedman-NOAA commented 3 years ago

@CoryMartin-NOAA @RussTreadon-NOAA What are the memory needs of the eupd job/execs? Dave is testing the cycled system on the kjet partition on Jet now after testing the xjet partition. Thanks!

kjet has 40ppn and 96 GB memory/node (similar to Hera). Here is the partition info for Jet: https://jetdocs.rdhpcs.noaa.gov/wiki/index.php/Jet_System_Overview

DavidHuber-NOAA commented 3 years ago

@KateFriedman-NOAA @CoryMartin-NOAA @RussTreadon-NOAA Another job that is hitting memory limits on xjet is gdasanalcalc, in particular interp_inc.x, when running 10 tasks/node @768/384. I have a solution for xjet in calcanl_gfs.py that spreads each instance of interp_inc.x across 2 nodes. If we end up porting to other partitions, this can be modified to spread across more nodes, if necessary. I suspect 768/384 will only be run on xjet and kjet, but it would be good to know what the memory requirements are for 384/192 since some of the older partitions have less memory per node.

DavidHuber-NOAA commented 3 years ago

I've successfully run a 384/192 on kjet for 5 cycles, including 2 GFS cycles, without any issues. I will be starting a longer run today on xjet.

That said, I have so far only been running standard options. Is there a desire to have support for DO_BUFRSND, DO_GEMPAK, DO_AWIPS, WAFSF, and/or any other features on Jet?

KateFriedman-NOAA commented 3 years ago

I've successfully run a 384/192 on kjet for 5 cycles, including 2 GFS cycles, without any issues. I will be starting a longer run today on xjet.

That said, I have so far only been running standard options. Is there a desire to have support for DO_BUFRSND, DO_GEMPAK, DO_AWIPS, WAFSF, and/or any other features on Jet?

@DavidHuber-NOAA Thanks for the kjet update! So DO_BUFRSND, DO_GEMPAK, DO_AWIPS, and WAFSF are only required on the production machines (WCOSS) so you don't need to test them on Jet. We'll want to test the tracker though, let me get that installed and give you the location when ready. That's one of my checklist items on this issue.

DavidHuber-NOAA commented 3 years ago

Issues related to this port:

Port GLDAS: NOAA-EMC/GLDAS#25 Update the GSI and enable regression tests: NOAA-EMC/GSI#215 Port the UPP (completed): NOAA-EMC/UPP#380 & NOAA-EMC/UPP#381

DavidHuber-NOAA commented 3 years ago

During testing on kjet (and replicated on Hera), I found that the C48 efcs jobs required more memory to run in some cases. At current resource allocations, some members will pass and some will fail with segmentation faults. For kjet, which has the same RAM/CPU ratio as Hera of 2.4GB, I changed nth_fv3 and nth_fv3_gfs to 2, which resolved the issue. Should the default be changed for all systems or just Hera and Jet?

KateFriedman-NOAA commented 3 years ago

I changed nth_fv3 and nth_fv3_gfs to 2, which resolved the issue. Should the default be changed for all systems or just Hera and Jet?

@DavidHuber-NOAA Yes, please change the default on all systems to use nth_fv3[_gfs]=2 for C48 fcst. We don't test that resolution often so the change is likely needed everywhere. Thanks!

DavidHuber-NOAA commented 3 years ago

The C768 gfsfcst ran into some issues on Jet when performing parallel netCDF atm writes (sfc writes were performed serially for all tests). The parallel writes would sometimes stall at irregular intervals, resulting in timed-out forecasts. Rerunning the forecast for the same cycle generated all of the expected netCDF files, so it seems to be a transient, non-repeatable issue. I observed this problem on xjet and @guoqing-noaa reported similar issues on kjet. Users may expect to have to rerun the 768 if the jobs fail.

As a test, the C768 forecast was run a total of 4 times with serial netCDF writes on xjet. These all completed successfully without any stalled netCDF writes.

DavidHuber-NOAA commented 3 years ago

The last three subcomponents have now been ported to Jet at the following commits:

GSI: 2f28fbf9cfc895a649b2517a25733e0e97a9d3d2 GLDAS: d1e437e4959639ddcdb069e6006f9fd77341d76d UPP: 823343f73ff90ef5a40519fed034fffc4d5f1eb6

DavidHuber-NOAA commented 2 years ago

The current authoritative develop branch as of September 9 has been ported to Jet and now resides on a fork (port_2_jet). The following subcomponents have been tested on Jet at the respective hashes/tags:

Three subcomponents have also been forked and ported with the ports residing in tags:

Some portions of the global workflow have had an initial attempt at porting but have not been tested:

This fork has been tested at 96/48, 192/96, 384/192, and 768/384 resolutions, with extended tests performed at 192/96 and 384/192.

Caveats: No attempt was made to port/test the GFS_WAFS repository, testing of VSDB was non-fruitful, and the TC Tracker software has not yet been installed on Jet.

DavidHuber-NOAA commented 2 years ago

@JiayiPeng-NOAA I have updated the Jet port of the TC_Tracker and I'm currently running a test with it. Would you mind checking the install? You can find it in /lfs1/NESDIS/nesdis-rdo2/David.Huber/TC_Tracker.

@jack-woollen I have ported the Fit2Obs package and I'm currently running a test with it. Could you check the install for me? You can find it in /lfs1/NESDIS/nesdis-rdo2/David.Huber/Fit2Obs.

DavidHuber-NOAA commented 2 years ago

@JiayiPeng-NOAA I finished a test run of the global workflow on Jet using the ported TC_Tracker. Would you be able to look at the logs, products, and install?

Logs: /lfs1/NESDIS/nesdis-rdo2/David.Huber/para/com/test_f2o_tc2/logs Archive products: /lfs1/NESDIS/nesdis-rdo2/David.Huber/archive/test_f2o_tc2 Install: /lfs1/NESDIS/nesdis-rdo2/David.Huber/TC_Tracker

Thanks!

DavidHuber-NOAA commented 2 years ago

@jack-woollen I finished a test run of the global workflow on Jet using the ported Fit2Obs package. Would you be able to look at the logs, products, and install?

Logs: /lfs1/NESDIS/nesdis-rdo2/David.Huber/para/com/test_f2o_tc2/logs Archive products: /lfs1/NESDIS/nesdis-rdo2/David.Huber/archive/test_f2o_tc2 Install: /lfs1/NESDIS/nesdis-rdo2/David.Huber/TC_Tracker

Thanks!

jack-woollen commented 2 years ago

David

I can't look on jet, no account. Hera, gaea, orion, niagara, wcoss, or wcoss2 ok.

Jack

On Thu, Nov 18, 2021 at 8:36 AM David Huber @.***> wrote:

@jack-woollen https://github.com/jack-woollen I finished a test run of the global workflow on Jet using the ported Fit2Obs package. Would you be able to look at the logs, products, and install?

Logs: /lfs1/NESDIS/nesdis-rdo2/David.Huber/para/com/test_f2o_tc2/logs Archive products: /lfs1/NESDIS/nesdis-rdo2/David.Huber/archive/test_f2o_tc2 Install: /lfs1/NESDIS/nesdis-rdo2/David.Huber/TC_Tracker

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/357#issuecomment-972871701, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO3XO6L26VMC3MNOGMGMBYLUMT6MHANCNFSM47INTMNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

JiayiPeng-NOAA commented 2 years ago

Hi David, I checked the GFS track/genesis data files: /mnt/lfs1/NESDIS/nesdis-rdo2/David.Huber/archive/test_f2o_tc2/atcf They look good to me. Thanks, Jiayi

On Thu, Nov 18, 2021 at 8:34 AM David Huber @.***> wrote:

@JiayiPeng-NOAA https://github.com/JiayiPeng-NOAA I finished a test run of the global workflow on Jet using the ported TC_Tracker. Would you be able to look at the logs, products, and install?

Logs: /lfs1/NESDIS/nesdis-rdo2/David.Huber/para/com/test_f2o_tc2/logs Archive products: /lfs1/NESDIS/nesdis-rdo2/David.Huber/archive/test_f2o_tc2 Install: /lfs1/NESDIS/nesdis-rdo2/David.Huber/TC_Tracker

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/357#issuecomment-972870209, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALRTVNK4P2X52PJVK4AAECDUMT6E7ANCNFSM47INTMNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

DavidHuber-NOAA commented 2 years ago

@JiayiPeng-NOAA Thank you very much for checking!

@jack-woollen I have transferred the logs and the archive to Hera. You can find them at /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_logs and /scratch1/NESDIS/nesdis-rdo2/jet_archive. Please let me know if you need any other data. Thanks!

jack-woollen commented 2 years ago

David

As far as the fit2obs scripts and codes are concerned, they are running correctly given the inputs. Ideally they would create forecast fit files comparing time0 data with each of the forecasts valid at time0 from the 5 previous days, That is not happening probably because there are not 5 previous days in the forecast archive when each day0 fit is being run. That should be simple to fix in the forecast archive cleanup scripts.

Jack

On Fri, Nov 19, 2021 at 1:39 PM David Huber @.***> wrote:

@JiayiPeng-NOAA https://github.com/JiayiPeng-NOAA Thank you very much for checking!

@jack-woollen https://github.com/jack-woollen I have transferred the logs and the archive to Hera. You can find them at /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_logs and /scratch1/NESDIS/nesdis-rdo2/jet_archive. Please let me know if you need any other data. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/357#issuecomment-974314181, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO3XO6NQKPQEM45NMKTJNO3UM2KUDANCNFSM47INTMNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

DavidHuber-NOAA commented 2 years ago

@jack-woollen Ah yes, I had set the forecasts to only run out 72 hours. I'll modify that to 120 and rerun the cycling experiment. Thanks!

jack-woollen commented 2 years ago

Forecasts need to run out to 126 hours because of time interpolation. -Jack

On Mon, Nov 29, 2021 at 1:32 PM David Huber @.***> wrote:

@jack-woollen https://github.com/jack-woollen Ah yes, I had set the forecasts to only run out 72 hours. I'll modify that to 120 and rerun the cycling experiment. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/357#issuecomment-981903609, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO3XO6LLHJBGJ7LJVAWWJV3UOPBMBANCNFSM47INTMNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

DavidHuber-NOAA commented 2 years ago

OK, 126 hours then. Thank you!

DavidHuber-NOAA commented 2 years ago

@jack-woollen I have completed the longer test. The outputs and logs are in the same locations as before on Hera: /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_logs and /scratch1/NESDIS/nesdis-rdo2/jet_archive

jack-woollen commented 2 years ago

David The pattern of fit files valid at 2020080600 is correct for the case where one verifying forecast is made every 24 hours at 00z for 5 previous days. Jack

/scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f00.raob.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f06.raob.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f24.raob.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f48.raob.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f72.raob.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f96.raob.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f120.raob.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f00.acft.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f06.acft.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f24.acft.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f48.acft.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f72.acft.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f96.acft.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f120.acft.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f00.acar.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f06.acar.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f24.acar.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f48.acar.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f72.acar.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f96.acar.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f120.acar.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f00.surf.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f06.surf.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f24.surf.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f48.surf.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f72.surf.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f96.surf.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f120.surf.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f00.sfc.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f06.sfc.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f24.sfc.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f48.sfc.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f72.sfc.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f96.sfc.2020080600 /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_archive/test_f2o_tc2/fits/f120.sfc.2020080600

On Fri, Dec 10, 2021 at 11:26 AM David Huber @.***> wrote:

@jack-woollen https://github.com/jack-woollen I have completed the longer test. The outputs and logs are in the same locations as before on Hera: /scratch1/NESDIS/nesdis-rdo2/David.Huber/jet_logs and /scratch1/NESDIS/nesdis-rdo2/jet_archive

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/357#issuecomment-991112508, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO3XO6MGMMN3M3OTHILUIPLUQIS4ZANCNFSM47INTMNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

DavidHuber-NOAA commented 2 years ago

Great, thank you for checking, @jack-woollen.

DavidHuber-NOAA commented 2 years ago

@JiayiPeng-NOAA Would it be possible to have the Jet port of the TC Tracker incorporated into the repository and a new tag made?

JiayiPeng-NOAA commented 2 years ago

Hi David, You are welcome to create a new tracker tag in your own Github. I don't know whether the HPC-stack was installed on JET or not. The GFS workflow team did not send me a request to create a tag for JET. Thanks, Jiayi

On Tue, Dec 14, 2021 at 8:27 AM David Huber @.***> wrote:

@JiayiPeng-NOAA https://github.com/JiayiPeng-NOAA Would it be possible to have the Jet port of the TC Tracker incorporated into the repository and a new tag made?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/357#issuecomment-993538466, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALRTVNK73O46HE6IXSSU623UQ5A47ANCNFSM47INTMNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

WalterKolczynski-NOAA commented 2 years ago

The GFS workflow team did not send me a request to create a tag for JET.

@JiayiPeng-NOAA I'll make one now then: please make a tag for Jet. hps-stack is on Jet, and should be what is used: https://github.com/NOAA-EMC/hpc-stack/wiki/Official-Installations

JiayiPeng-NOAA commented 2 years ago

Hi Walter, Thanks for the update. I usually work with Kate to create a new tracker tag. After I make a new tracker version for JET, Kate needs to modify the GFSv16 workflow and test the tracker jobs. If everything is ok, a new TAG will be created. Kate is busy with the WCOSS2 transition. I don't think she has time now. After the GFSv16 WCOSS2 transition is done, I may start to work on this. Thanks, Jiayi

On Tue, Dec 14, 2021 at 9:21 AM Walter Kolczynski - NOAA < @.***> wrote:

The GFS workflow team did not send me a request to create a tag for JET.

@JiayiPeng-NOAA https://github.com/JiayiPeng-NOAA I'll make one now then: please make a tag for Jet. hps-stack is on Jet, and should be what is used: https://github.com/NOAA-EMC/hpc-stack/wiki/Official-Installations

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/global-workflow/issues/357#issuecomment-993589930, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALRTVNNBHO2IHSVVSWPNYHDUQ5HGBANCNFSM47INTMNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

DavidHuber-NOAA commented 2 years ago

@KateFriedman-NOAA During the Jet testing, I had a failed call to global_gsi.x during a gdasanal job. However, the job failed to exit and the gdasfcst was then called, which failed. I've opened a GSI ticket: NOAA-EMC/GSI#293.

KateFriedman-NOAA commented 2 years ago

@KateFriedman-NOAA During the Jet testing, I had a failed call to global_gsi.x during a gdasanal job. However, the job failed to exit and the gdasfcst was then called, which failed. I've opened a GSI ticket: NOAA-EMC/GSI#293.

Roger that, thanks for the update @DavidHuber-NOAA !

arunchawla-NOAA commented 2 years ago

Hi Can someone provide a status on this ? @KateFriedman-NOAA @lgannoaa @WalterKolczynski-NOAA @DavidHuber-NOAA

DavidHuber-NOAA commented 2 years ago

@arunchawla-NOAA The global workflow code changes are complete. I have completed cycled, ATM tests at C192/C96 (40-cycle) and C384/C192 (4-cycle) on xjet. I still need to run a 1-cycle C768/C384 test on xjet and a 5-cycle C384/C192 kjet.

I have working ports of the TC_Tracker and Fit2Obs in my personal area on Jet that need to be incorporated into their respective authoritative repos. Jiayi has requested to work with Kate on the TC_Tracker. A request should be made for the Fit2Obs port as well. They will then need to be installed in official locations.

I'm not currently focused on this port as I am finishing up the S4 port and waiting on the TC_Tracker and Fit2Obs packages.