Closed GeorgeVandenberghe-NOAA closed 2 years ago
And Input I/O of model state fields IS affected by decomposition, just noting.
Wading through the code. A large fraction of the work will be modifying the I/O to either scatter 2D subdomains rather than 1D contiguous slices (the serial option), or modifying the parallel I/O to get the subdomains. The rest looks like bookkeeping with loop indices but I have not looked for stencil operators yet that need halo exchanges. I need to learn much more about the NetCDF API also. That's the status so far. Working on the standalone FV3 portion first.
@GeorgeVandenberghe-NOAA Agreed. My plan is to have @JesseMeng-NOAA and @BoCui-NOAA do the bookkeeping parts of changing I loop indices and take care of halo exchanges when necessary.
Work on standalone post was promising. Many issues just assembling a testcase for inline post for ufs-weather-model. I am trying to find where in the model this is called from and how the model history files are assembled on the I/O group side and it took several days to get a working testcase, then isolate a UPP library from the build so I could work with it and that's where I am now on Jet since WCOSS is down for a week. This process has taken much more time than expected. GWV 3/17
For what it's worth the intel tracebackqq('string ',iret) issues a traceback from wherever it's called, then keeps going. I tried that to get the call tree but it segfaults in ESMF itself, but still provides enough information for me.
If iret is zero the program terminates if iret is -1 a traceback is written to stderr and the program continues running.
Using this it looks like PROCESS( ) a major post routine, is called directly from something in ESMF and there are at least thirty ESMF routines in the call chain above it. Jet intel is currently frozen by a transient system issue on Jet
Thank you @GeorgeVandenberghe-NOAA for the update. Sound like you're testing stand-alone post and in-line post at the same time? Could you come to next Tuesday's UPP-re-engineering tag-up?
Of course. It's my main project right now.
On Thu, Mar 18, 2021 at 10:13 AM HuiyaChuang-NOAA @.***> wrote:
Thank you @GeorgeVandenberghe-NOAA https://github.com/GeorgeVandenberghe-NOAA for the update. Sound like you're testing stand-alone post and in-line post at the same time? Could you come to next Tuesday's UPP-re-engineering tag-up?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-801961797, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FT7USK6OEZ64W5M2DTTEIC7BANCNFSM4YVYMD5Q .
--
George W Vandenberghe
IMSG at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
@.***
301-683-3769(work) 3017751547(cell)
After sync'ing with the current EMC_post develop head, I can no longer reproduce the results from that code when I apply my changes to SURFCE.f (found this checking Boi's changes which DO reproduce.. not his problem, MINE) So far the changes consist of changing all arrays dimensioned 1:im to isx:iex but setting isx to 1 and iex to im STILL produces differences from when the im or 1:im dimension is left in. The arrays should be EXACTLY the same shape so .. figuring it out. I was about to submit a PR for the changes for inspection only (not for incorporation) but now I have this issue.
Also the differences are small
from cmp -l 1468035867 0 1 1469293425 0 1 NATLEV.GrbF06 1240962334 0 1 1242219892 0 1 PRSLEV.GrbF06
After sync'ing with the current EMC_post develop head, I can no longer reproduce the results from that code when I apply my changes to SURFCE.f (found this checking Boi's changes which DO reproduce.. not his problem, MINE) So far the changes consist of changing all arrays dimensioned 1:im to isx:iex but setting isx to 1 and iex to im STILL produces differences from when the im or 1:im dimension is left in. The arrays should be EXACTLY the same shape so .. figuring it out. I was about to submit a PR for the changes for inspection only (not for incorporation) but now I have this issue.
look at SURFCE.f history on Github, the latest update was Jim's fix to threading violation 7 days ago. The commit prior to this was back in Dec. I believe you started your folk after Dec, right? @WenMeng-NOAA Did latest threading fix changed UPP regression test results?
@HuiyaChuang-NOAA There are no changed results from Jim's fixes in UPP regression tests.
I am working from a sync'ed upp develop point. I just recloned it and added the SURFCE and other necessary fixes so my old fork isn't the issue. Difference of just two bytes in the middle of each of the files, suggests something trivial like a pad, is initializing differently but it still causes a cmp exact regression test to fail
I will prepare a PR soon to show my differences. Don't merge the PR, just examine it.
On Thu, Apr 1, 2021 at 12:58 PM HuiyaChuang-NOAA @.***> wrote:
After sync'ing with the current EMC_post develop head, I can no longer reproduce the results from that code when I apply my changes to SURFCE.f (found this checking Boi's changes which DO reproduce.. not his problem, MINE) So far the changes consist of changing all arrays dimensioned 1:im to isx:iex but setting isx to 1 and iex to im STILL produces differences from when the im or 1:im dimension is left in. The arrays should be EXACTLY the same shape so .. figuring it out. I was about to submit a PR for the changes for inspection only (not for incorporation) but now I have this issue.
look at SURFCE.f history on Github, the latest update was Jim's fix to threading violation 7 days ago. The commit prior to this was back in Dec. I believe you started your folk after Dec, right? @WenMeng-NOAA https://github.com/WenMeng-NOAA Did latest threading fix changed UPP regression test results?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-812041698, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FXGCKGPLJUDIFU46L3TGSQ3HANCNFSM4YVYMD5Q .
--
George W Vandenberghe
IMSG at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
@.***
301-683-3769(work) 3017751547(cell)
Sometimes, the changes will make UPP grib2 file size changed. In UPP regression tests, we add field by field value comparison. It would be fine for no unexpected changed results.
File sizes didn't change . Two bytes inside each of them in the middle, did.
On Thu, Apr 1, 2021 at 1:12 PM WenMeng-NOAA @.***> wrote:
Sometimes, the changes will make UPP grib2 file size changed. In UPP regression tests, we add field by field value comparison. It would be fine for no unexpected changed results.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-812049570, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FQUXQSKPRRSTJURXRTTGSSPTANCNFSM4YVYMD5Q .
--
George W Vandenberghe
IMSG at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
@.***
301-683-3769(work) 3017751547(cell)
@GeorgeVandenberghe-NOAA can you point me at your regression test output directory. I will take a look.
I was asking for a code evaluation only. Regression test on Jet only passes for ONE (the one examined) with fields the same but two bytes different in Grib files. I submitted the PR for an eyeball of my code only
Test output is on /mnt/lfs4/HFIP/hfv3gfs/gwv/post/emcpost/reg/fv3r_2019062000. Base files for comparison are in ./BASE in this directory
I could do this for all of the others easily but am still working out a byte difference issue in THIS one.
On Thu, Apr 1, 2021 at 3:13 PM HuiyaChuang-NOAA @.***> wrote:
@GeorgeVandenberghe-NOAA https://github.com/GeorgeVandenberghe-NOAA can you point me at your regression test output directory. I will take a look.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-812115098, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FRLNDEXHVL2LED5KATTGTAVZANCNFSM4YVYMD5Q .
--
George W Vandenberghe
IMSG at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
@.***
301-683-3769(work) 3017751547(cell)
I found the following line in CLDRAD.f
real FULL_CLD(IM,JM) !-- Must be dimensioned for the full domain
Why do we need full domain? Concerned I may miss others that need full domain although so far I am only redimensioning partial J domain arrarys replacing IM with isx:iex
@GeorgeVandenberghe-NOAA My understanding is full_cld is used for calling routine AllGETHERV for hallo exchange? See line 938. @HuiyaChuang-NOAA may chime in for detail.
Wen is right, FULL_CLD(IM,JM) must be defined for full domain due to subroutine allgetherv(mpi_allgather) where mpi_allgather is called there and grid1 must have dimension (im,jm).
I took a note at document https://docs.google.com/spreadsheets/d/10jlqaBHlcg8xHHc4kH1JWJbTPGMcZeZLNbMCszdza2c/edit#gid=0
@WenMeng-NOAA @GeorgeVandenberghe-NOAA
Has anyone looked at the "inspection only" PR submitted late last week, April 1 or so for second opinions and comments?
@GeorgeVandenberghe-NOAA I haven't got the chance to look at it yet. I might do this week.
Ok. It's not slowing me down so don't be rushef
On Tuesday, April 6, 2021, WenMeng-NOAA @.***> wrote:
@GeorgeVandenberghe-NOAA https://github.com/GeorgeVandenberghe-NOAA I haven't got the chance to look at it yet. I might do this week.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-814492686, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FUODVBHLXJWNB4J46DTHOIURANCNFSM4YVYMD5Q .
--
George W Vandenberghe
IMSG at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
@.***
301-683-3769(work) 3017751547(cell)
I will start to look at it this week.
There is a data structure datapd. THe following line in CLDRAD.f datapd(1:im,1:jend-jsta+1,cfld)=GRID1(1:im,jsta:jend)
suggests it's used as some kind of halo pad. Could someone describe this in more detail before I figure out how it should be decomposed in the I direction. Will it need the full I dimension, a superset of the rank's I domain of isx:iex or just the rank's I domain?
My understanding is that this array is for writing field values in GRIB2 in full domain. You may see it a lot of routines. I would defer this question to @HuiyaChuang-NOAA or @junwang-noaa for detail.
This is how it's allocated. I haven't changed it allocate(datapd(im,1:jend-jsta+1,nrecout+100))
On Wed, Apr 7, 2021 at 2:38 PM WenMeng-NOAA @.***> wrote:
My understanding is that this array is for writing field values in GRIB2 in full domain. You may see it a lot of routines. I would defer this question to @HuiyaChuang-NOAA https://github.com/HuiyaChuang-NOAA or @junwang-noaa https://github.com/junwang-noaa for detail.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-815136730, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FWLS3SJNBH254NL7LDTHSRDVANCNFSM4YVYMD5Q .
--
George W Vandenberghe
IMSG at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
@.***
301-683-3769(work) 3017751547(cell)
@GeorgeVandenberghe-NOAA I have reviewed your inspection PR which makes sense to me. I sent you my comments on specific places. Thanks!
I will change the variable names to be consistent and review the timer changes. I believe I either found the timer not entirely working or only reporting to the integer truncated second . MPI_WTIME is better than rtc() on linux systems. The timer is also reporting milliseconds and I have a preference for seconds as the unit (with a resolution of 10::-5 seconds or better)
On Thu, Apr 8, 2021 at 9:06 AM WenMeng-NOAA @.***> wrote:
@GeorgeVandenberghe-NOAA https://github.com/GeorgeVandenberghe-NOAA I have reviewed your inspection PR which makes sense to me. I sent you my comments on specific places. Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-815808200, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FRZXUVXUO5MGUMWIX3THWS3PANCNFSM4YVYMD5Q .
--
George W Vandenberghe
IMSG at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
@.***
301-683-3769(work) 3017751547(cell)
My understanding is that this array is for writing field values in GRIB2 in full domain. You may see it a lot of routines. I would defer this question to @HuiyaChuang-NOAA or @junwang-noaa for detail.
Yes, Wen was right. Jun created this array to store all data to be written to Grib2 output thus its dimensions can not be changed.
Has anyone looked at the "inspection only" PR submitted late last week, April 1 or so for second opinions and comments?
I will look tomorrow. Have a few meetings today and also need to review a proposal.
I am running the UPP standalone test on Dell and will get results soon.
The PR remains extant. This is fine but I have applied a lot of changes, in particular changing the boundary names as requested by reviewers. I have a generalized regression test that runs all of the regression test jobs and only requires one job be changed (for account) on each system. This is almost ready on HPSS. It will run from HPSS baseline files, or clone the development head and cut a new baseline which can then be used to test new personal branches and forks. If the latter is used, it is assumed the development head or master is good. This has changed twice in the past six weeks hence the need to cut a new baseline, a 20 minute operation on WCOSS-C and jet, slower on other systems.
Having difficulties sync'ing emc-post master with my clone. Silent failures in the sync have caused it to drift and I am running down the differences with the original master. DO NOT MERGE THIS PR but examine for content instead.
Wen imported George's feature branch "develop" from George's fork into Wen's fork and renamed as "post_2d_decomp" so George, Jesse and Bo can continue committing changes and Wen will keep this branch up to date with the upstream/develop.
I think we should work on decomposing
ALLOCATE_ALL.f
in x direction next
Linked to #339 as a lot of work was documented there.
I will list summaries based on what I know. Others please feel free to add.
@GeorgeVandenberghe-NOAA @BoCui-NOAA @WenMeng-NOAA @JesseMeng-NOAA Listed my version of summaries of all your work above. Thank you for your hard work. I think we're in good shape.
I think the next step would be
I would like add one bullet for the documents.
@GeorgeVandenberghe-NOAA Can you also work on two Collect*.f in addition to two EXCH subroutines?
Yeah. Working out how to figure out where neighbors are with 2D decomposition and it's another "easy case, mental block" situation
On Tue, Jul 20, 2021 at 1:20 PM HuiyaChuang-NOAA @.***> wrote:
@GeorgeVandenberghe-NOAA https://github.com/GeorgeVandenberghe-NOAA Can you also work on two Collect*.f in addition to two EXCH subroutines?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-883561944, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FR7JYKRYF5M3XXZDR3TYWV57ANCNFSM4YVYMD5Q .
--
George W Vandenberghe
IMSG at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
@.***
301-683-3769(work) 3017751547(cell)
@BoCui-NOAA @JesseMeng-NOAA I will be traveling a lot starting this Thursday so would like to assign the subroutines for you two to decompose in X direction. Hopefully, George will be done with two EXCH.f soon so we can run a more extensive test in the near future. Keep in mind Bo will soon be diverted to work on NAEFS transition to wcoss2, I will give Jesse more subroutines to decompose. Also, let's wait to decompose g as I am hoping to work with Wen to retire some of these subroutines.
Bo: subroutines starts with A, B, C (except for Collect*.f which will be worked on by George and CLDRAD which you've decomposed) Jesse: subroutines starts with D,E,F,G,I,L,M,N,O,P,R,S,T,U,W with the following exceptions as some of them have been decomposed and some others may be phased out: MDLFLD.f, para_range.f, INITPOST_GFS_SIGIO, and maybe INITPOST_GFS_NEMS.f, INITPOST_NEMS.f
The two COLLECT routines invert the scatter process. That's all. Once I crack EXCH, it should be very straightforward to do these.
On Tue, Jul 20, 2021 at 2:18 PM HuiyaChuang-NOAA @.***> wrote:
@BoCui-NOAA https://github.com/BoCui-NOAA @JesseMeng-NOAA https://github.com/JesseMeng-NOAA I will be traveling a lot starting this Thursday so would like to assign the subroutines for you two to decompose in X direction. Hopefully, George will be done with two EXCH.f soon so we can run a more extensive test in the near future. Keep in mind Bo will soon be diverted to work on NAEFS transition to wcoss2, I will give Jesse more subroutines to decompose. Also, let's wait to decompose g as I am hoping to work with Wen to retire some of these subroutines.
Bo: subroutines starts with A, B, C (except for Collect*.f which will be worked on by George and CLDRAD which you've decomposed) Jesse: subroutines starts with D,E,F,G,I,L,M,N,O,P,R,S,T,U,W with the following exceptions as some of them have been decomposed and some others may be phased out: MDLFLD.f, para_range.f, INITPOST_GFS_SIGIO, and maybe INITPOST_GFS_NEMS.f, INITPOST_NEMS.f
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-883598346, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FTEWWL3BRC77LOZ73LTYW4XNANCNFSM4YVYMD5Q .
--
George W Vandenberghe
IMSG at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
@.***
301-683-3769(work) 3017751547(cell)
@GeorgeVandenberghe-NOAA thank you. That sounds good. Please let us know when you have two EXCH subroutines done. I will ask most likely Jesse to test them.
Once i get the bookkeeping done they should just work. If i didn't have all the distracting home issues this would be long done.
On Wednesday, July 21, 2021, HuiyaChuang-NOAA @.***> wrote:
@GeorgeVandenberghe-NOAA https://github.com/GeorgeVandenberghe-NOAA thank you. That sounds good. Please let us know when you have two EXCH subroutines done. I will ask most likely Jesse to test them.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-884352262, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FXBDRAFBZ3OWFUPKILTY35WVANCNFSM4YVYMD5Q .
--
George W Vandenberghe
IMSG at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
@.***
301-683-3769(work) 3017751547(cell)
Slow slog but it will pay off. WOrking on scattering a 2D coordinate array and verifying I/J match exactly. They do for 1D decomp but the full subdomain array isn't populated by the scatter. That's done by exch.f and exch2.f later. But operating on the coordinate array first is a test of correctness.
in 2d the scatter looks like it will need some work.. not sure how thats even working now so maybe I missed something since without halos you are getting bit reproducible results.
@BoCui-NOAA @JesseMeng-NOAA I will be traveling a lot starting this Thursday so would like to assign the subroutines for you two to decompose in X direction. Hopefully, George will be done with two EXCH.f soon so we can run a more extensive test in the near future. Keep in mind Bo will soon be diverted to work on NAEFS transition to wcoss2, I will give Jesse more subroutines to decompose. Also, let's wait to decompose g as I am hoping to work with Wen to retire some of these subroutines.
Bo: subroutines starts with A, B, C (except for Collect*.f which will be worked on by George and CLDRAD which you've decomposed)
Jesse: subroutines starts with D,E,F,G,I,L,M,N,O,P,R,S,T,U,W with the following exceptions as some of them have been decomposed and some others may be phased out: MDLFLD.f, para_range.f, INITPOST_GFS_SIGIO, and maybe INITPOST_GFS_NEMS.f, INITPOST_NEMS.f
Got it. Will do.
Who originally wrote the exch*f routines. It looks trivial now that I understand the decomposition and I hope to have it working 7/27. Home issues have mitigated. I gave up on being able to get the house ready by 7/29.. just beyond what I can get done so now I can actually sleep again and focus on work😡
On Mon, Jul 26, 2021 at 10:40 AM JesseMeng-NOAA @.***> wrote:
@BoCui-NOAA https://github.com/BoCui-NOAA @JesseMeng-NOAA https://github.com/JesseMeng-NOAA I will be traveling a lot starting this Thursday so would like to assign the subroutines for you two to decompose in X direction. Hopefully, George will be done with two EXCH.f soon so we can run a more extensive test in the near future. Keep in mind Bo will soon be diverted to work on NAEFS transition to wcoss2, I will give Jesse more subroutines to decompose. Also, let's wait to decompose g as I am hoping to work with Wen to retire some of these subroutines.
Bo: subroutines starts with A, B, C (except for Collect*.f which will be worked on by George and CLDRAD which you've decomposed)
Jesse: subroutines starts with D,E,F,G,I,L,M,N,O,P,R,S,T,U,W with the following exceptions as some of them have been decomposed and some others may be phased out: MDLFLD.f, para_range.f, INITPOST_GFS_SIGIO, and maybe INITPOST_GFS_NEMS.f, INITPOST_NEMS.f
Got it. Will do.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/EMC_post/issues/274#issuecomment-886762647, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FS4L7P6J3PBU5VVCI3TZVXVPANCNFSM4YVYMD5Q .
--
George W Vandenberghe
IMSG at NOAA/NWS/NCEP/EMC
5830 University Research Ct., Rm. 2141
College Park, MD 20740
@.***
301-683-3769(work) 3017751547(cell)
Who originally wrote the exch*f routines. It looks trivial now that I understand the decomposition and I hope to have it working 7/27. Home issues have mitigated. I gave up on being able to get the house ready by 7/29.. just beyond what I can get done so now I can actually sleep again and focus on work😡 … On Mon, Jul 26, 2021 at 10:40 AM JesseMeng-NOAA @.> wrote: @BoCui-NOAA https://github.com/BoCui-NOAA @JesseMeng-NOAA https://github.com/JesseMeng-NOAA I will be traveling a lot starting this Thursday so would like to assign the subroutines for you two to decompose in X direction. Hopefully, George will be done with two EXCH.f soon so we can run a more extensive test in the near future. Keep in mind Bo will soon be diverted to work on NAEFS transition to wcoss2, I will give Jesse more subroutines to decompose. Also, let's wait to decompose g as I am hoping to work with Wen to retire some of these subroutines. Bo: subroutines starts with A, B, C (except for Collect.f which will be worked on by George and CLDRAD which you've decomposed) Jesse: subroutines starts with D,E,F,G,I,L,M,N,O,P,R,S,T,U,W with the following exceptions as some of them have been decomposed and some others may be phased out: MDLFLD.f, para_range.f, INITPOST_GFS_SIGIO, and maybe INITPOST_GFS_NEMS.f, INITPOST_NEMS.f Got it. Will do. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#274 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FS4L7P6J3PBU5VVCI3TZVXVPANCNFSM4YVYMD5Q . -- George W Vandenberghe IMSG at NOAA/NWS/NCEP/EMC 5830 University Research Ct., Rm. 2141 College Park, MD 20740 **@.*** 301-683-3769(work) 3017751547(cell)
Jim T. was the one who decomposed UPP in Y direction in 2000. :-) And the EXCH.f was written by him.
Who originally wrote the exch*f routines. It looks trivial now that I understand the decomposition and I hope to have it working 7/27. Home issues have mitigated. I gave up on being able to get the house ready by 7/29.. just beyond what I can get done so now I can actually sleep again and focus on work😡 … On Mon, Jul 26, 2021 at 10:40 AM JesseMeng-NOAA @.> wrote: @BoCui-NOAA https://github.com/BoCui-NOAA @JesseMeng-NOAA https://github.com/JesseMeng-NOAA I will be traveling a lot starting this Thursday so would like to assign the subroutines for you two to decompose in X direction. Hopefully, George will be done with two EXCH.f soon so we can run a more extensive test in the near future. Keep in mind Bo will soon be diverted to work on NAEFS transition to wcoss2, I will give Jesse more subroutines to decompose. Also, let's wait to decompose g as I am hoping to work with Wen to retire some of these subroutines. Bo: subroutines starts with A, B, C (except for Collect.f which will be worked on by George and CLDRAD which you've decomposed) Jesse: subroutines starts with D,E,F,G,I,L,M,N,O,P,R,S,T,U,W with the following exceptions as some of them have been decomposed and some others may be phased out: MDLFLD.f, para_range.f, INITPOST_GFS_SIGIO, and maybe INITPOST_GFS_NEMS.f, INITPOST_NEMS.f Got it. Will do. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#274 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANDS4FS4L7P6J3PBU5VVCI3TZVXVPANCNFSM4YVYMD5Q . -- George W Vandenberghe IMSG at NOAA/NWS/NCEP/EMC 5830 University Research Ct., Rm. 2141 College Park, MD 20740 **@.*** 301-683-3769(work) 3017751547(cell)
Good to hear your home situation has been mitigated and you're coming ack to work on this.
Working on exch.f now. It needs ileft and iright to be defined in MPI_FIRST.f (done) and added to CTLBLK.f (done). The sendrecv needs the X bounds of the slice to be retrieved from upper and lower processors changed (done). Routines needing change are CTLBLK.f, MPI_FIRST.f as well as EXCH.f. Added a state array to test transfers and verify we were getting what we thought we were getting in scatters. We weren't, I fixed THAT but it was only an issue for numx >1. Overall steady progress. How much can we get done with exch.f working? Exch2 has shape issues and will take a little more work; needs a sendrecv for each row rather than a single sendrecv for both as is done now. Logic to figure out ileft and iright is added to MPI_FIRST.f
EMC_post is currently decmposed on latitude (J) only. This is adequate for several more years but since post is generally being refactored, now is a good time to make the jump to 2D. A second goal is to make the 2D decomposition either flexible, or just have it mimic the ufs-weather-model decomposition so developers working on both codes can exploit commonality. This will be a modestly difficult project with most effort, figuring out the plumbing of the code (in progress). This issue is being created for management and project leader tracking and per EMC management directives and also best practices, results should be tracked through this Github issue or slack, NOT email.
There are many OTHER scaling issues in the post that are not affected by the decomposition. Most of the issues are orthogonal to the decomposition though and can be worked independently. The most salient is input I/O of model state fields in the standalone post.
By 03/01/2021:
The offline post testing procedure provided by Jesse can be found at here
The inline post testing procedure provided by Bo can be found at here
Jesse's FV3 branch can be found at here