NOAA-EMC / UPP

Other
37 stars 100 forks source link

Refactor EMC_post decomposition from 1D to 2D as part of EMC_post refactoring #274

Closed GeorgeVandenberghe-NOAA closed 2 years ago

GeorgeVandenberghe-NOAA commented 3 years ago

EMC_post is currently decmposed on latitude (J) only. This is adequate for several more years but since post is generally being refactored, now is a good time to make the jump to 2D. A second goal is to make the 2D decomposition either flexible, or just have it mimic the ufs-weather-model decomposition so developers working on both codes can exploit commonality. This will be a modestly difficult project with most effort, figuring out the plumbing of the code (in progress). This issue is being created for management and project leader tracking and per EMC management directives and also best practices, results should be tracked through this Github issue or slack, NOT email.

There are many OTHER scaling issues in the post that are not affected by the decomposition. Most of the issues are orthogonal to the decomposition though and can be worked independently. The most salient is input I/O of model state fields in the standalone post.

By 03/01/2021:

GeorgeVandenberghe-NOAA commented 3 years ago

And Input I/O of model state fields IS affected by decomposition, just noting.

GeorgeVandenberghe-NOAA commented 3 years ago

Wading through the code. A large fraction of the work will be modifying the I/O to either scatter 2D subdomains rather than 1D contiguous slices (the serial option), or modifying the parallel I/O to get the subdomains. The rest looks like bookkeeping with loop indices but I have not looked for stencil operators yet that need halo exchanges. I need to learn much more about the NetCDF API also. That's the status so far. Working on the standalone FV3 portion first.

HuiyaChuang-NOAA commented 3 years ago

@GeorgeVandenberghe-NOAA Agreed. My plan is to have @JesseMeng-NOAA and @BoCui-NOAA do the bookkeeping parts of changing I loop indices and take care of halo exchanges when necessary.

GeorgeVandenberghe-NOAA commented 3 years ago

Work on standalone post was promising. Many issues just assembling a testcase for inline post for ufs-weather-model. I am trying to find where in the model this is called from and how the model history files are assembled on the I/O group side and it took several days to get a working testcase, then isolate a UPP library from the build so I could work with it and that's where I am now on Jet since WCOSS is down for a week. This process has taken much more time than expected. GWV 3/17

GeorgeVandenberghe-NOAA commented 3 years ago

For what it's worth the intel tracebackqq('string ',iret) issues a traceback from wherever it's called, then keeps going. I tried that to get the call tree but it segfaults in ESMF itself, but still provides enough information for me.

If iret is zero the program terminates if iret is -1 a traceback is written to stderr and the program continues running.

Using this it looks like PROCESS( ) a major post routine, is called directly from something in ESMF and there are at least thirty ESMF routines in the call chain above it. Jet intel is currently frozen by a transient system issue on Jet

HuiyaChuang-NOAA commented 3 years ago

Thank you @GeorgeVandenberghe-NOAA for the update. Sound like you're testing stand-alone post and in-line post at the same time? Could you come to next Tuesday's UPP-re-engineering tag-up?

GeorgeVandenberghe-NOAA commented 3 years ago

Of course. It's my main project right now.

On Thu, Mar 18, 2021 at 10:13 AM HuiyaChuang-NOAA @.***> wrote:

Thank you @GeorgeVandenberghe-NOAA https://github.com/GeorgeVandenberghe-NOAA for the update. Sound like you're testing stand-alone post and in-line post at the same time? Could you come to next Tuesday's UPP-re-engineering tag-up?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-801961797, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FT7USK6OEZ64W5M2DTTEIC7BANCNFSM4YVYMD5Q .

--

George W Vandenberghe

IMSG at NOAA/NWS/NCEP/EMC

5830 University Research Ct., Rm. 2141

College Park, MD 20740

@.***

301-683-3769(work) 3017751547(cell)

GeorgeVandenberghe-NOAA commented 3 years ago

After sync'ing with the current EMC_post develop head, I can no longer reproduce the results from that code when I apply my changes to SURFCE.f (found this checking Boi's changes which DO reproduce.. not his problem, MINE) So far the changes consist of changing all arrays dimensioned 1:im to isx:iex but setting isx to 1 and iex to im STILL produces differences from when the im or 1:im dimension is left in. The arrays should be EXACTLY the same shape so .. figuring it out. I was about to submit a PR for the changes for inspection only (not for incorporation) but now I have this issue.

GeorgeVandenberghe-NOAA commented 3 years ago

Also the differences are small

from cmp -l 1468035867 0 1 1469293425 0 1 NATLEV.GrbF06 1240962334 0 1 1242219892 0 1 PRSLEV.GrbF06

HuiyaChuang-NOAA commented 3 years ago

After sync'ing with the current EMC_post develop head, I can no longer reproduce the results from that code when I apply my changes to SURFCE.f (found this checking Boi's changes which DO reproduce.. not his problem, MINE) So far the changes consist of changing all arrays dimensioned 1:im to isx:iex but setting isx to 1 and iex to im STILL produces differences from when the im or 1:im dimension is left in. The arrays should be EXACTLY the same shape so .. figuring it out. I was about to submit a PR for the changes for inspection only (not for incorporation) but now I have this issue.

look at SURFCE.f history on Github, the latest update was Jim's fix to threading violation 7 days ago. The commit prior to this was back in Dec. I believe you started your folk after Dec, right? @WenMeng-NOAA Did latest threading fix changed UPP regression test results?

WenMeng-NOAA commented 3 years ago

@HuiyaChuang-NOAA There are no changed results from Jim's fixes in UPP regression tests.

GeorgeVandenberghe-NOAA commented 3 years ago

I am working from a sync'ed upp develop point. I just recloned it and added the SURFCE and other necessary fixes so my old fork isn't the issue. Difference of just two bytes in the middle of each of the files, suggests something trivial like a pad, is initializing differently but it still causes a cmp exact regression test to fail

I will prepare a PR soon to show my differences. Don't merge the PR, just examine it.

On Thu, Apr 1, 2021 at 12:58 PM HuiyaChuang-NOAA @.***> wrote:

After sync'ing with the current EMC_post develop head, I can no longer reproduce the results from that code when I apply my changes to SURFCE.f (found this checking Boi's changes which DO reproduce.. not his problem, MINE) So far the changes consist of changing all arrays dimensioned 1:im to isx:iex but setting isx to 1 and iex to im STILL produces differences from when the im or 1:im dimension is left in. The arrays should be EXACTLY the same shape so .. figuring it out. I was about to submit a PR for the changes for inspection only (not for incorporation) but now I have this issue.

look at SURFCE.f history on Github, the latest update was Jim's fix to threading violation 7 days ago. The commit prior to this was back in Dec. I believe you started your folk after Dec, right? @WenMeng-NOAA https://github.com/WenMeng-NOAA Did latest threading fix changed UPP regression test results?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-812041698, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FXGCKGPLJUDIFU46L3TGSQ3HANCNFSM4YVYMD5Q .

--

George W Vandenberghe

IMSG at NOAA/NWS/NCEP/EMC

5830 University Research Ct., Rm. 2141

College Park, MD 20740

@.***

301-683-3769(work) 3017751547(cell)

WenMeng-NOAA commented 3 years ago

Sometimes, the changes will make UPP grib2 file size changed. In UPP regression tests, we add field by field value comparison. It would be fine for no unexpected changed results.

GeorgeVandenberghe-NOAA commented 3 years ago

File sizes didn't change . Two bytes inside each of them in the middle, did.

On Thu, Apr 1, 2021 at 1:12 PM WenMeng-NOAA @.***> wrote:

Sometimes, the changes will make UPP grib2 file size changed. In UPP regression tests, we add field by field value comparison. It would be fine for no unexpected changed results.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-812049570, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FQUXQSKPRRSTJURXRTTGSSPTANCNFSM4YVYMD5Q .

--

George W Vandenberghe

IMSG at NOAA/NWS/NCEP/EMC

5830 University Research Ct., Rm. 2141

College Park, MD 20740

@.***

301-683-3769(work) 3017751547(cell)

HuiyaChuang-NOAA commented 3 years ago

@GeorgeVandenberghe-NOAA can you point me at your regression test output directory. I will take a look.

GeorgeVandenberghe-NOAA commented 3 years ago

I was asking for a code evaluation only. Regression test on Jet only passes for ONE (the one examined) with fields the same but two bytes different in Grib files. I submitted the PR for an eyeball of my code only

Test output is on /mnt/lfs4/HFIP/hfv3gfs/gwv/post/emcpost/reg/fv3r_2019062000. Base files for comparison are in ./BASE in this directory

I could do this for all of the others easily but am still working out a byte difference issue in THIS one.

On Thu, Apr 1, 2021 at 3:13 PM HuiyaChuang-NOAA @.***> wrote:

@GeorgeVandenberghe-NOAA https://github.com/GeorgeVandenberghe-NOAA can you point me at your regression test output directory. I will take a look.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-812115098, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FRLNDEXHVL2LED5KATTGTAVZANCNFSM4YVYMD5Q .

--

George W Vandenberghe

IMSG at NOAA/NWS/NCEP/EMC

5830 University Research Ct., Rm. 2141

College Park, MD 20740

@.***

301-683-3769(work) 3017751547(cell)

GeorgeVandenberghe-NOAA commented 3 years ago

I found the following line in CLDRAD.f

  real    FULL_CLD(IM,JM)   !-- Must be dimensioned for the full domain

Why do we need full domain? Concerned I may miss others that need full domain although so far I am only redimensioning partial J domain arrarys replacing IM with isx:iex

WenMeng-NOAA commented 3 years ago

@GeorgeVandenberghe-NOAA My understanding is full_cld is used for calling routine AllGETHERV for hallo exchange? See line 938. @HuiyaChuang-NOAA may chime in for detail.

BoCui-NOAA commented 3 years ago

Wen is right, FULL_CLD(IM,JM) must be defined for full domain due to subroutine allgetherv(mpi_allgather) where mpi_allgather is called there and grid1 must have dimension (im,jm).

I took a note at document https://docs.google.com/spreadsheets/d/10jlqaBHlcg8xHHc4kH1JWJbTPGMcZeZLNbMCszdza2c/edit#gid=0

@WenMeng-NOAA @GeorgeVandenberghe-NOAA

GeorgeVandenberghe-NOAA commented 3 years ago

Has anyone looked at the "inspection only" PR submitted late last week, April 1 or so for second opinions and comments?

WenMeng-NOAA commented 3 years ago

@GeorgeVandenberghe-NOAA I haven't got the chance to look at it yet. I might do this week.

GeorgeVandenberghe-NOAA commented 3 years ago

Ok. It's not slowing me down so don't be rushef

On Tuesday, April 6, 2021, WenMeng-NOAA @.***> wrote:

@GeorgeVandenberghe-NOAA https://github.com/GeorgeVandenberghe-NOAA I haven't got the chance to look at it yet. I might do this week.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-814492686, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FUODVBHLXJWNB4J46DTHOIURANCNFSM4YVYMD5Q .

--

George W Vandenberghe

IMSG at NOAA/NWS/NCEP/EMC

5830 University Research Ct., Rm. 2141

College Park, MD 20740

@.***

301-683-3769(work) 3017751547(cell)

BoCui-NOAA commented 3 years ago

I will start to look at it this week.

GeorgeVandenberghe-NOAA commented 3 years ago

There is a data structure datapd. THe following line in CLDRAD.f datapd(1:im,1:jend-jsta+1,cfld)=GRID1(1:im,jsta:jend)

suggests it's used as some kind of halo pad. Could someone describe this in more detail before I figure out how it should be decomposed in the I direction. Will it need the full I dimension, a superset of the rank's I domain of isx:iex or just the rank's I domain?

WenMeng-NOAA commented 3 years ago

My understanding is that this array is for writing field values in GRIB2 in full domain. You may see it a lot of routines. I would defer this question to @HuiyaChuang-NOAA or @junwang-noaa for detail.

GeorgeVandenberghe-NOAA commented 3 years ago

This is how it's allocated. I haven't changed it allocate(datapd(im,1:jend-jsta+1,nrecout+100))

On Wed, Apr 7, 2021 at 2:38 PM WenMeng-NOAA @.***> wrote:

My understanding is that this array is for writing field values in GRIB2 in full domain. You may see it a lot of routines. I would defer this question to @HuiyaChuang-NOAA https://github.com/HuiyaChuang-NOAA or @junwang-noaa https://github.com/junwang-noaa for detail.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-815136730, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FWLS3SJNBH254NL7LDTHSRDVANCNFSM4YVYMD5Q .

--

George W Vandenberghe

IMSG at NOAA/NWS/NCEP/EMC

5830 University Research Ct., Rm. 2141

College Park, MD 20740

@.***

301-683-3769(work) 3017751547(cell)

WenMeng-NOAA commented 3 years ago

@GeorgeVandenberghe-NOAA I have reviewed your inspection PR which makes sense to me. I sent you my comments on specific places. Thanks!

GeorgeVandenberghe-NOAA commented 3 years ago

I will change the variable names to be consistent and review the timer changes. I believe I either found the timer not entirely working or only reporting to the integer truncated second . MPI_WTIME is better than rtc() on linux systems. The timer is also reporting milliseconds and I have a preference for seconds as the unit (with a resolution of 10::-5 seconds or better)

On Thu, Apr 8, 2021 at 9:06 AM WenMeng-NOAA @.***> wrote:

@GeorgeVandenberghe-NOAA https://github.com/GeorgeVandenberghe-NOAA I have reviewed your inspection PR which makes sense to me. I sent you my comments on specific places. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-815808200, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FRZXUVXUO5MGUMWIX3THWS3PANCNFSM4YVYMD5Q .

--

George W Vandenberghe

IMSG at NOAA/NWS/NCEP/EMC

5830 University Research Ct., Rm. 2141

College Park, MD 20740

@.***

301-683-3769(work) 3017751547(cell)

HuiyaChuang-NOAA commented 3 years ago

My understanding is that this array is for writing field values in GRIB2 in full domain. You may see it a lot of routines. I would defer this question to @HuiyaChuang-NOAA or @junwang-noaa for detail.

Yes, Wen was right. Jun created this array to store all data to be written to Grib2 output thus its dimensions can not be changed.

HuiyaChuang-NOAA commented 3 years ago

Has anyone looked at the "inspection only" PR submitted late last week, April 1 or so for second opinions and comments?

I will look tomorrow. Have a few meetings today and also need to review a proposal.

BoCui-NOAA commented 3 years ago

I am running the UPP standalone test on Dell and will get results soon.

GeorgeVandenberghe-NOAA commented 3 years ago

The PR remains extant. This is fine but I have applied a lot of changes, in particular changing the boundary names as requested by reviewers. I have a generalized regression test that runs all of the regression test jobs and only requires one job be changed (for account) on each system. This is almost ready on HPSS. It will run from HPSS baseline files, or clone the development head and cut a new baseline which can then be used to test new personal branches and forks. If the latter is used, it is assumed the development head or master is good. This has changed twice in the past six weeks hence the need to cut a new baseline, a 20 minute operation on WCOSS-C and jet, slower on other systems.

GeorgeVandenberghe-NOAA commented 3 years ago

Having difficulties sync'ing emc-post master with my clone. Silent failures in the sync have caused it to drift and I am running down the differences with the original master. DO NOT MERGE THIS PR but examine for content instead.

WenMeng-NOAA commented 3 years ago

Wen imported George's feature branch "develop" from George's fork into Wen's fork and renamed as "post_2d_decomp" so George, Jesse and Bo can continue committing changes and Wen will keep this branch up to date with the upstream/develop.

HuiyaChuang-NOAA commented 3 years ago

I think we should work on decomposing

ALLOCATE_ALL.f

in x direction next

HuiyaChuang-NOAA commented 3 years ago

Linked to #339 as a lot of work was documented there.

I will list summaries based on what I know. Others please feel free to add.

  1. George finished modifying MPI_FIRST along with 3 other subroutines to add x decomposition
  2. Bo and Jesse finished modifying INITPOST_GFS_Netcdf_para and MDLFLD to do x decomposition
  3. Wen provided a control file to only output model level U/V/T/Q so we can test all above x decomposition updates work when exchanges are not needed. Wen also provided a test case
  4. Bo and Jesse ran tests and after some further updates to INITPOST_GFS_Netcdf_para, the test was able to reproduce output with numx=2 (for x decomposition) but seg faulted while deallocating arrays.
  5. George started debugging segfault by adding out of bound check and found there is an out of bound violation in CLDRAD
  6. Bo started updating CLDRAD to do x decomposition. She finished last night and her tests with numx=1,2,4 all reproduced without segfaults.
  7. George ran out of bound checks for GEFS and WRF runs and found more out of bound violations. Wen is working on fixing them.
HuiyaChuang-NOAA commented 3 years ago

@GeorgeVandenberghe-NOAA @BoCui-NOAA @WenMeng-NOAA @JesseMeng-NOAA Listed my version of summaries of all your work above. Thank you for your hard work. I think we're in good shape.

I think the next step would be

  1. waiting for George to finish modifying EXCH.f and EXCH2.f
  2. Jesse started working on updating other UPP subroutines to do x decomposition when he returns
  3. Bo will soon transition to work on transitioning NAEFS to wcoss2
BoCui-NOAA commented 3 years ago

I would like add one bullet for the documents.

  1. Bo finished modifying ALLOCATE_ALL.f to do X decomposition. Regression tests with numx=1,2,4 were run and bit identical results were generated.
HuiyaChuang-NOAA commented 3 years ago

@GeorgeVandenberghe-NOAA Can you also work on two Collect*.f in addition to two EXCH subroutines?

GeorgeVandenberghe-NOAA commented 3 years ago

Yeah. Working out how to figure out where neighbors are with 2D decomposition and it's another "easy case, mental block" situation

On Tue, Jul 20, 2021 at 1:20 PM HuiyaChuang-NOAA @.***> wrote:

@GeorgeVandenberghe-NOAA https://github.com/GeorgeVandenberghe-NOAA Can you also work on two Collect*.f in addition to two EXCH subroutines?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-883561944, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FR7JYKRYF5M3XXZDR3TYWV57ANCNFSM4YVYMD5Q .

--

George W Vandenberghe

IMSG at NOAA/NWS/NCEP/EMC

5830 University Research Ct., Rm. 2141

College Park, MD 20740

@.***

301-683-3769(work) 3017751547(cell)

HuiyaChuang-NOAA commented 3 years ago

@BoCui-NOAA @JesseMeng-NOAA I will be traveling a lot starting this Thursday so would like to assign the subroutines for you two to decompose in X direction. Hopefully, George will be done with two EXCH.f soon so we can run a more extensive test in the near future. Keep in mind Bo will soon be diverted to work on NAEFS transition to wcoss2, I will give Jesse more subroutines to decompose. Also, let's wait to decompose g as I am hoping to work with Wen to retire some of these subroutines.

Bo: subroutines starts with A, B, C (except for Collect*.f which will be worked on by George and CLDRAD which you've decomposed) Jesse: subroutines starts with D,E,F,G,I,L,M,N,O,P,R,S,T,U,W with the following exceptions as some of them have been decomposed and some others may be phased out: MDLFLD.f, para_range.f, INITPOST_GFS_SIGIO, and maybe INITPOST_GFS_NEMS.f, INITPOST_NEMS.f

GeorgeVandenberghe-NOAA commented 3 years ago

The two COLLECT routines invert the scatter process. That's all. Once I crack EXCH, it should be very straightforward to do these.

On Tue, Jul 20, 2021 at 2:18 PM HuiyaChuang-NOAA @.***> wrote:

@BoCui-NOAA https://github.com/BoCui-NOAA @JesseMeng-NOAA https://github.com/JesseMeng-NOAA I will be traveling a lot starting this Thursday so would like to assign the subroutines for you two to decompose in X direction. Hopefully, George will be done with two EXCH.f soon so we can run a more extensive test in the near future. Keep in mind Bo will soon be diverted to work on NAEFS transition to wcoss2, I will give Jesse more subroutines to decompose. Also, let's wait to decompose g as I am hoping to work with Wen to retire some of these subroutines.

Bo: subroutines starts with A, B, C (except for Collect*.f which will be worked on by George and CLDRAD which you've decomposed) Jesse: subroutines starts with D,E,F,G,I,L,M,N,O,P,R,S,T,U,W with the following exceptions as some of them have been decomposed and some others may be phased out: MDLFLD.f, para_range.f, INITPOST_GFS_SIGIO, and maybe INITPOST_GFS_NEMS.f, INITPOST_NEMS.f

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-883598346, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FTEWWL3BRC77LOZ73LTYW4XNANCNFSM4YVYMD5Q .

--

George W Vandenberghe

IMSG at NOAA/NWS/NCEP/EMC

5830 University Research Ct., Rm. 2141

College Park, MD 20740

@.***

301-683-3769(work) 3017751547(cell)

HuiyaChuang-NOAA commented 3 years ago

@GeorgeVandenberghe-NOAA thank you. That sounds good. Please let us know when you have two EXCH subroutines done. I will ask most likely Jesse to test them.

GeorgeVandenberghe-NOAA commented 3 years ago

Once i get the bookkeeping done they should just work. If i didn't have all the distracting home issues this would be long done.

On Wednesday, July 21, 2021, HuiyaChuang-NOAA @.***> wrote:

@GeorgeVandenberghe-NOAA https://github.com/GeorgeVandenberghe-NOAA thank you. That sounds good. Please let us know when you have two EXCH subroutines done. I will ask most likely Jesse to test them.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-884352262, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FXBDRAFBZ3OWFUPKILTY35WVANCNFSM4YVYMD5Q .

--

George W Vandenberghe

IMSG at NOAA/NWS/NCEP/EMC

5830 University Research Ct., Rm. 2141

College Park, MD 20740

@.***

301-683-3769(work) 3017751547(cell)

GeorgeVandenberghe-NOAA commented 3 years ago

Slow slog but it will pay off. WOrking on scattering a 2D coordinate array and verifying I/J match exactly. They do for 1D decomp but the full subdomain array isn't populated by the scatter. That's done by exch.f and exch2.f later. But operating on the coordinate array first is a test of correctness.

in 2d the scatter looks like it will need some work.. not sure how thats even working now so maybe I missed something since without halos you are getting bit reproducible results.

JesseMeng-NOAA commented 3 years ago

@BoCui-NOAA @JesseMeng-NOAA I will be traveling a lot starting this Thursday so would like to assign the subroutines for you two to decompose in X direction. Hopefully, George will be done with two EXCH.f soon so we can run a more extensive test in the near future. Keep in mind Bo will soon be diverted to work on NAEFS transition to wcoss2, I will give Jesse more subroutines to decompose. Also, let's wait to decompose g as I am hoping to work with Wen to retire some of these subroutines.

Bo: subroutines starts with A, B, C (except for Collect*.f which will be worked on by George and CLDRAD which you've decomposed)

Jesse: subroutines starts with D,E,F,G,I,L,M,N,O,P,R,S,T,U,W with the following exceptions as some of them have been decomposed and some others may be phased out: MDLFLD.f, para_range.f, INITPOST_GFS_SIGIO, and maybe INITPOST_GFS_NEMS.f, INITPOST_NEMS.f

Got it. Will do.

GeorgeVandenberghe-NOAA commented 3 years ago

Who originally wrote the exch*f routines. It looks trivial now that I understand the decomposition and I hope to have it working 7/27. Home issues have mitigated. I gave up on being able to get the house ready by 7/29.. just beyond what I can get done so now I can actually sleep again and focus on work😡

On Mon, Jul 26, 2021 at 10:40 AM JesseMeng-NOAA @.***> wrote:

@BoCui-NOAA https://github.com/BoCui-NOAA @JesseMeng-NOAA https://github.com/JesseMeng-NOAA I will be traveling a lot starting this Thursday so would like to assign the subroutines for you two to decompose in X direction. Hopefully, George will be done with two EXCH.f soon so we can run a more extensive test in the near future. Keep in mind Bo will soon be diverted to work on NAEFS transition to wcoss2, I will give Jesse more subroutines to decompose. Also, let's wait to decompose g as I am hoping to work with Wen to retire some of these subroutines.

Bo: subroutines starts with A, B, C (except for Collect*.f which will be worked on by George and CLDRAD which you've decomposed)

Jesse: subroutines starts with D,E,F,G,I,L,M,N,O,P,R,S,T,U,W with the following exceptions as some of them have been decomposed and some others may be phased out: MDLFLD.f, para_range.f, INITPOST_GFS_SIGIO, and maybe INITPOST_GFS_NEMS.f, INITPOST_NEMS.f

Got it. Will do.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-886762647, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FS4L7P6J3PBU5VVCI3TZVXVPANCNFSM4YVYMD5Q .

--

George W Vandenberghe

IMSG at NOAA/NWS/NCEP/EMC

5830 University Research Ct., Rm. 2141

College Park, MD 20740

@.***

301-683-3769(work) 3017751547(cell)

HuiyaChuang-NOAA commented 3 years ago

Who originally wrote the exch*f routines. It looks trivial now that I understand the decomposition and I hope to have it working 7/27. Home issues have mitigated. I gave up on being able to get the house ready by 7/29.. just beyond what I can get done so now I can actually sleep again and focus on work😡 On Mon, Jul 26, 2021 at 10:40 AM JesseMeng-NOAA @.> wrote: @BoCui-NOAA https://github.com/BoCui-NOAA @JesseMeng-NOAA https://github.com/JesseMeng-NOAA I will be traveling a lot starting this Thursday so would like to assign the subroutines for you two to decompose in X direction. Hopefully, George will be done with two EXCH.f soon so we can run a more extensive test in the near future. Keep in mind Bo will soon be diverted to work on NAEFS transition to wcoss2, I will give Jesse more subroutines to decompose. Also, let's wait to decompose g as I am hoping to work with Wen to retire some of these subroutines. Bo: subroutines starts with A, B, C (except for Collect.f which will be worked on by George and CLDRAD which you've decomposed) Jesse: subroutines starts with D,E,F,G,I,L,M,N,O,P,R,S,T,U,W with the following exceptions as some of them have been decomposed and some others may be phased out: MDLFLD.f, para_range.f, INITPOST_GFS_SIGIO, and maybe INITPOST_GFS_NEMS.f, INITPOST_NEMS.f Got it. Will do. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#274 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FS4L7P6J3PBU5VVCI3TZVXVPANCNFSM4YVYMD5Q . -- George W Vandenberghe IMSG at NOAA/NWS/NCEP/EMC 5830 University Research Ct., Rm. 2141 College Park, MD 20740 **@.*** 301-683-3769(work) 3017751547(cell)

Jim T. was the one who decomposed UPP in Y direction in 2000. :-) And the EXCH.f was written by him.

HuiyaChuang-NOAA commented 3 years ago

Who originally wrote the exch*f routines. It looks trivial now that I understand the decomposition and I hope to have it working 7/27. Home issues have mitigated. I gave up on being able to get the house ready by 7/29.. just beyond what I can get done so now I can actually sleep again and focus on work😡 On Mon, Jul 26, 2021 at 10:40 AM JesseMeng-NOAA @.> wrote: @BoCui-NOAA https://github.com/BoCui-NOAA @JesseMeng-NOAA https://github.com/JesseMeng-NOAA I will be traveling a lot starting this Thursday so would like to assign the subroutines for you two to decompose in X direction. Hopefully, George will be done with two EXCH.f soon so we can run a more extensive test in the near future. Keep in mind Bo will soon be diverted to work on NAEFS transition to wcoss2, I will give Jesse more subroutines to decompose. Also, let's wait to decompose g as I am hoping to work with Wen to retire some of these subroutines. Bo: subroutines starts with A, B, C (except for Collect.f which will be worked on by George and CLDRAD which you've decomposed) Jesse: subroutines starts with D,E,F,G,I,L,M,N,O,P,R,S,T,U,W with the following exceptions as some of them have been decomposed and some others may be phased out: MDLFLD.f, para_range.f, INITPOST_GFS_SIGIO, and maybe INITPOST_GFS_NEMS.f, INITPOST_NEMS.f Got it. Will do. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#274 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FS4L7P6J3PBU5VVCI3TZVXVPANCNFSM4YVYMD5Q . -- George W Vandenberghe IMSG at NOAA/NWS/NCEP/EMC 5830 University Research Ct., Rm. 2141 College Park, MD 20740 **@.*** 301-683-3769(work) 3017751547(cell)

Good to hear your home situation has been mitigated and you're coming ack to work on this.

GeorgeVandenberghe-NOAA commented 3 years ago

Working on exch.f now. It needs ileft and iright to be defined in MPI_FIRST.f (done) and added to CTLBLK.f (done). The sendrecv needs the X bounds of the slice to be retrieved from upper and lower processors changed (done). Routines needing change are CTLBLK.f, MPI_FIRST.f as well as EXCH.f. Added a state array to test transfers and verify we were getting what we thought we were getting in scatters. We weren't, I fixed THAT but it was only an issue for numx >1. Overall steady progress. How much can we get done with exch.f working? Exch2 has shape issues and will take a little more work; needs a sendrecv for each row rather than a single sendrecv for both as is done now. Logic to figure out ileft and iright is added to MPI_FIRST.f