GEOS-ESM / MAPL

MAPL is a foundation layer of the GEOS architecture, whose original purpose is to supplement the Earth System Modeling Framework (ESMF)
https://geos-esm.github.io/MAPL/
Apache License 2.0
27 stars 18 forks source link

pFIO segmentation fault :: C1440 writing a single 3d variable using ExtDataDriver.x #1942

Closed metdyn closed 1 year ago

metdyn commented 1 year ago

Problem Description

A test for writing HISTORY data using ExtDataDriver.x exposed a Segmentation fault error from MultiGroupServer.F90 (L 768) https://github.com/GEOS-ESM/MAPL/blob/main/pfio/MultiGroupServer.F90#L768

Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x2abc5e2c2f60)

Many thanks to Ben.

To reproduce:

Use ExtDataDriver.x to introduce C1440 grid, create one 3d variable ( nx, ny, nz=181), then output the same variable to nc4 file using cascade 1 node (46 cores). See details in /discover/nobackup/yyu11/run/mapl_run_tutorial/test_c1440_1_or_2_node or Ben: /gpfsm/dnb31/bmauer/tmp/example_extdatadriver

Let

nps=24  
exe=bin/ExtDataDriver.x
IOSERVER_OPTIONS="--npes_model $nps --nodes_output_server $io_node"
IOSERVER_EXTRA="--oserver_type multigroup --npes_backend_pernode $backend_per_node"
 Starting pFIO input server on Clients
 MultiServer Start: nfront, nwriter          24          21
Starting pFIO output server on 1 nodes
 Running new case
     SHMEM: NumCores per Node = 24
     SHMEM: NumNodes in use   = 1
     SHMEM: Total PEs         = 24
     SHMEM: NumNodes in use  = 1

...

EXPSRC:
 EXPID:
 Descr:
 DisableSubVmChecks: F

 Reading HISTORY RC Files:
 -------------------------
 NOT using buffer I/O for file: HISTORY1.rc
 NOT using buffer I/O for file: case1.rcx

 Freq: 00010000  Dur: 00000000  TM:   -1  Collection: case1

 Independent Output Export States:
 ---------------------------------
           1 Root

 Initializing Output Stream: case1
 ---------------------------
       Format: CFIO
         Mode: instantaneous
       Slices:          181
      Deflate:            0
    Frequency:        10000
     Ref_Date:     20040101
     Ref_Time:            0
     Duration:            0
  Output RSLV:         1440        8640
    XY-offset:            0   (DcPc: Dateline Center, Pole Center)
      Fields: var1

[borgh232:43098:0:43098] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x2abc5e2c2f60)
==== backtrace (tid:  43098) ====
 0  /usr/lib64/libucs.so.0(ucs_handle_error+0xe4) [0x2abc5f506da4]
 1  /usr/lib64/libucs.so.0(+0x2210c) [0x2abc5f50710c]
 2  /usr/lib64/libucs.so.0(+0x222c2) [0x2abc5f5072c2]
 3  /gpfsm/dnb33/yyu11/br_mapl/install_mapl/bin/../lib/libMAPL.pfio.so(+0x848f52) [0x2abafae58f52]
 4  /gpfsm/dnb33/yyu11/br_mapl/install_mapl/bin/../lib/libMAPL.pfio.so(pfio_multigroupservermod_mp_start_back_+0x88d) [0x2abafae289a3]
 5  /gpfsm/dnb33/yyu11/br_mapl/install_mapl/bin/../lib/libMAPL.pfio.so(pfio_multigroupservermod_mp_start_+0x71b) [0x2abafae07fa3]
 6  /gpfsm/dnb33/yyu11/br_mapl/install_mapl/bin/../lib/libMAPL.base.so(mapl_servermanager_mp_initialize_+0x1965d) [0x2abaf9c9a045]
 7  /discover/nobackup/yyu11/br_mapl/install_mapl/bin/ExtDataDriver.x() [0x472b30]
 8  /discover/nobackup/yyu11/br_mapl/install_mapl/bin/ExtDataDriver.x() [0x46de8b]
 9  /discover/nobackup/yyu11/br_mapl/install_mapl/bin/ExtDataDriver.x() [0x46a57a]
10  /discover/nobackup/yyu11/br_mapl/install_mapl/bin/ExtDataDriver.x() [0x42b092]
11  /lib64/libc.so.6(__libc_start_main+0xf5) [0x2abb02544ac5]
12  /discover/nobackup/yyu11/br_mapl/install_mapl/bin/ExtDataDriver.x() [0x42afa9]
=================================
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
ExtDataDriver.x    00000000005F41FA  Unknown               Unknown  Unknown
libpthread-2.22.s  00002ABB018F0CE0  Unknown               Unknown  Unknown
libMAPL.pfio.so    00002ABAFAE58F52  pfio_multigroupse         768  MultiGroupServer.F90
libMAPL.pfio.so    00002ABAFAE289A3  pfio_multigroupse         450  MultiGroupServer.F90
libMAPL.pfio.so    00002ABAFAE07FA3  pfio_multigroupse         202  MultiGroupServer.F90
libMAPL.base.so    00002ABAF9C9A045  mapl_servermanage         265  ServerManager.F90
ExtDataDriver.x    0000000000472B30  extdatadrivermod_         172  ExtDataDriverMod.F90
ExtDataDriver.x    000000000046DE8B  extdatadrivermod_          91  ExtDataDriverMod.F90
ExtDataDriver.x    000000000046A57A  MAIN__                     25  ExtDataDriver.F90
ExtDataDriver.x    000000000042B092  Unknown               Unknown  Unknown
libc-2.22.so       00002ABB02544AC5  __libc_start_main     Unknown  Unknown
ExtDataDriver.x    000000000042AFA9  Unknown               Unknown  Unknown
weiyuan-jiang commented 1 year ago

I suspect that is too large for pFIO. 1440 1440 6 * 180 = 2,239,488,000 , which is greater than max integer of kind=4 (2,147,483,647). So the point line 768 is broken. (s0, e0 may exceed max int) l_5d(1:q%count(1),1:q%count(2),1:q%count(3),1:q%count(4), 1:q%count(5))=> f_d_m%idata(s0:e0)

weiyuan-jiang commented 1 year ago

@metdyn , would you please change the declaration of s0,e0, s1, e2, ...... and msize_word to integer(kind=8) and try again?

metdyn commented 1 year ago

@weiyuan-jiang, Thanks, sure, I will rebuild and let you know!

tclune commented 1 year ago

Yes - this is probably just a limitation of default integers. But this makes C5760 scary. Will be 16x larger yet. But I suppose still smaller than a node memory. ~143 GB if I did the math right.

metdyn commented 1 year ago

The math for memory of a single DP 3d variable (6 tile, 8 byte for DP):
nx * nx * nz * 6 * 8 / 1024**3 [GB] shows for nz=181

C1440:    16.7 GB
C2880:    67.1 GB
C5760:    268.4 GB

Maybe we can use large memory note to test C5760 on discover with one-node for IO?

tclune commented 1 year ago

@weiyuan-jiang How did Bill ever do the C5760 case. Yes, he only wrote odd levels, but that is still far too large for INT32.

Hmmm.

tclune commented 1 year ago

@metdyn GEOS uses 32 bit reals almost exclusively. So your math above is high by a factor of 2.

tclune commented 1 year ago

Our solution for node memory limitations is to either write odd/even levels to different collections or different faces to different collections. Easy changes that Ben/Weiyuan can help with. But we may encounter other limits on the way (like the INT32 indices you are seeing).

metdyn commented 1 year ago

@metdyn GEOS uses 32 bit reals almost exclusively. So your math above is high by a factor of 2.

@tclune with Ben's ExtDataDriver.x, we are writing only one collection containing one 3d variable in full range (levels and faces). I will check with Ben/Weiyuan for detailed option. You are right, if only SP floating point real numbers are communicated to IO nodes, we will have 1/2 of my calculated numbers. That is good news! We can continue to push up resolution using a single-node for IO, which can also test the code capability and efficiency.

metdyn commented 1 year ago

@weiyuan-jiang : the s0, e0 did not go out of bound ( < 2^31 ). f_d_m%idata(s0:e0): the array itself is very large (~ - 2^30) and suspicious. I wonder what the values in f_d_m%idata(s0:e0) stand for ?

I have changed the code, rebuild, and re-run - integer :: msize_word, d_rank, request_id + integer (INT64) :: msize_word + integer :: d_rank, request_id

In fact, the output does not change after integer*8 is used for msize_word.

I find something strange:

ck: d_rank=           1
 ck: d_rank=           3
 ck: d_rank=           3
 ck: d_rank=           3
 ck: d_rank=           3
 ck: d_rank=           5
 ck, s0:e0          2073602    95904001
 ck, q%start(1:5)           1           1           1           1           1
 ck, q%count(1:5)         720         720           1         181           1
 ck in case= 5
 s1:e1,s2:e2,s3:e3,s4:e4,s5:e5           1         720           1         720
           1           1           1         181           1           1
 ck l_5d(1:q%count(1),1:q%count(2),1:q%count(3),1:q%count(4), 1:q%count(5))
 -1020846080

 ck f_d_m%idata(s0:e0)
 -1020846080 -1020846080 -1020846080 -1020846080 -1020846080 -1020846080
 -1020846080 -1020846080 -1020846080 -1020846080 -1020846080 -1020846080
 -1020846080 -1020846080 -1020846080 -1020846080 -1020846080 -1020846080
 -1020846080 -1020846080 -1020846080 -1020846080 -1020846080 -1020846080
tclune commented 1 year ago

Can someone paste a hyperlink of where s0 and e0 are computed? I fear that the INT64 issue needs to be expanded a bit.

weiyuan-jiang commented 1 year ago

There are many places needs to be changed. For example here https://github.com/GEOS-ESM/MAPL/blob/cfad78e595c49df9fd741b81e01911920c75cfb1/pfio/MultiGroupServer.F90#L667 , needs to be product(int(q%global_count,INT64)) to get the size right. Maybe there are other places, keep looking...

weiyuan-jiang commented 1 year ago

@tclune
This is global to local mapping and it is assembled to a multi dimension array again. It is accumulated of all local processors.https://github.com/GEOS-ESM/MAPL/blob/cfad78e595c49df9fd741b81e01911920c75cfb1/pfio/MultiGroupServer.F90#L752 and https://github.com/GEOS-ESM/MAPL/blob/cfad78e595c49df9fd741b81e01911920c75cfb1/pfio/MultiGroupServer.F90#L770

weiyuan-jiang commented 1 year ago

@metdyn , Can you try this branch ? https://github.com/GEOS-ESM/MAPL/tree/feature/wjiang/large_output . There are still limits in this branch. The local size can't not exceed 2G byte

metdyn commented 1 year ago

@weiyuan-jiang, I find this branch did not solve my C1440 ExtDataDriver.x run problem. (I rebuild it twice, the second time from scratch.) Will you please continue to pin down the problem? I will make a comparison between data output from low to high resolution. Thanks!

weiyuan-jiang commented 1 year ago

I don't know. Maybe he wrote level by level.

tclune commented 1 year ago

Nope. We found the HISTORY.rc file (ask @mathomp4 if you want the details). Definitely wrote out on alternating levels O(30) in one file.

weiyuan-jiang commented 1 year ago

The problem may come down to the allocation of allocate(x, source =II, STAT=err) print*, "err: ", err

where x is class(*), allocatable, dimension(:). If the length of "II" exceeds max int, then err=41 before (including) intel 2021.6.0 . The latest compiler 2021.7.0 seems fine. gcc 12.1.0 also looks fine.

Further test is needed.

weiyuan-jiang commented 1 year ago

@metdyn , Indeed the gnu compiler can pass the run. We need baselib build by intel's 2012.7.0 for the test. @mathomp4

metdyn commented 1 year ago

@weiyuan-jiang That sounds great! I will build with gcc 12.1.0 to see the results on MAPL history output, and with intel 2021.7.0 (when available). Thanks a lot!

mathomp4 commented 1 year ago

@metdyn , Indeed the gnu compiler can pass the run. We need baselib build by intel's 2012.7.0 for the test. @mathomp4

Well, I do have a Baselibs with Intel 2021.7. But, you can't run GEOS with it. And it seems like it infects some of @tclune's other code as well:

https://github.com/GEOS-ESM/MAPL/issues/1933

I think a workaround is needed...somewhere.

mathomp4 commented 1 year ago

That said, if you'd like to try:

/gpfsm/dhome/mathomp4/GitG5Modules/SLES12/7.7.0/g5_modules.intel2021_7_0.impi2021_7_0

is an Intel 2021.7 g5_modules

metdyn commented 1 year ago

Thank you, @mathomp4, for the information on using gcc12! @weiyuan-jiang, Thank you very much for your work! I used gcc12 Baselibs from @mathomp4 to rebuild MAPL. The C1440 oserver writing problem disappears. Output is:

MultiServer Start: nfront, nwriter          24          21
Starting pFIO output server on 1 nodes

     SHMEM: Total PEs         = 24

 8.8G Jan 25 11:37 case1.2004.nc4

I will continue to increase resolution to test oserver. Thanks!

mathomp4 commented 1 year ago

Wow. You can write a C1440 file out on a single node? Impressive!

metdyn commented 1 year ago

Hi @weiyuan-jiang, Hi @mathomp4 :

How do you think?

mathomp4 commented 1 year ago

@metdyn Well, you can try model large and see what happens. I'm not sure we've ever done any runs with it as we've never needed it.

metdyn commented 1 year ago

Hi everyone,

This is the output from writing out C2880 with 1 ionode. It looks like the output says it is meeting the true memory boundary: MAPL/pfio/UnlimitedEntity.F90', around line 116: Error allocating 36030873600 bytes: Cannot allocate memory.

36030873600 bytes = 33.5 G bytes

 MultiServer Start: nfront, nwriter          24          21
Starting pFIO output server on 1 nodes
     SHMEM: Total PEs         = 24

In file '/discover/nobackup/yyu11/br_mapl_wj_Jan25_gcc/MAPL/pfio/UnlimitedEntity.F90', around line 116: Error allocating 36030873600 bytes: Cannot allocate memory

Error termination. Backtrace:
#0  0x2b211084139b in __pfio_unlimitedentitymod_MOD_new_unlimitedentity_1d
        at /discover/nobackup/yyu11/br_mapl_wj_Jan25_gcc/MAPL/pfio/UnlimitedEntity.F90:115
#1  0x2b211086c9e9 in __pfio_attributemod_MOD_new_attribute_1d
        at /discover/nobackup/yyu11/br_mapl_wj_Jan25_gcc/MAPL/pfio/Attribute.F90:56
#2  0x2b2110a70ba0 in start_back_writers
        at /discover/nobackup/yyu11/br_mapl_wj_Jan25_gcc/MAPL/pfio/MultiGroupServer.F90:669
#3  0x2b2110a82201 in __pfio_multigroupservermod_MOD_start_back
        at /discover/nobackup/yyu11/br_mapl_wj_Jan25_gcc/MAPL/pfio/MultiGroupServer.F90:447
#4  0x2b2110a8b457 in __pfio_multigroupservermod_MOD_start
        at /discover/nobackup/yyu11/br_mapl_wj_Jan25_gcc/MAPL/pfio/MultiGroupServer.F90:201
#5  0x2b21101089f7 in __mapl_servermanager_MOD_initialize
        at /discover/nobackup/yyu11/br_mapl_wj_Jan25_gcc/MAPL/base/ServerManager.F90:265
#6  0x42f664 in __extdatadrivermod_MOD_initialize_io_clients_servers
        at /discover/nobackup/yyu11/br_mapl_wj_Jan25_gcc/MAPL/Tests/ExtDataDriverMod.F90:183
#7  0x42f9a4 in __extdatadrivermod_MOD_run
        at /discover/nobackup/yyu11/br_mapl_wj_Jan25_gcc/MAPL/Tests/ExtDataDriverMod.F90:91
#8  0x42c36e in extdata_driver
        at /discover/nobackup/yyu11/br_mapl_wj_Jan25_gcc/MAPL/Tests/ExtDataDriver.F90:25
#9  0x42c3f6 in main
        at /discover/nobackup/yyu11/br_mapl_wj_Jan25_gcc/MAPL/Tests/ExtDataDriver.F90:6
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
metdyn commented 1 year ago

@metdyn Well, you can try model large and see what happens. I'm not sure we've ever done any runs with it as we've never needed it.

Thank you and I will give a try.

tclune commented 1 year ago

I doubt model=large is our issue.

@weiyuan-jiang and I were talking yesterday and identified at least one place where the memory requirements are probably 2x the actual data, and possibly 3x. We also discussed possible mitigations.

We also need to check with @wmputman to see if he ever wrote full 3D fields at C2880. The fact that he was able to write odd/even levels at C5760 strongy suggests that C2880 ought to "work" without any unusual steps on our part.

But ... maybe some recent changes in pfio have compounded the memory usage? Only @weiyuan-jiang would be able to figure that out.

weiyuan-jiang commented 1 year ago

@metdyn I doubted that the cmake flag has impact on this. If the allocatable is not class(*), there is no error. So I think it is more likely a bug of ifort. ( move_alloc works fine with previous ifort) . Here is a simple example you can play with

program main use, intrinsic :: iso_c_binding, only: c_ptr use, intrinsic :: iso_c_binding, only: c_loc use, intrinsic :: iso_fortran_env, only: INT32, INT64 use, intrinsic :: iso_c_binding, only: c_f_pointer implicit none integer(kind=INT64) :: s(3) integer, allocatable :: Is(:) type(c_ptr) :: address integer, pointer :: g1d(:) class(*), allocatable, target :: x(:) integer :: i, err

s(1) = huge(0) - 1_INT64 s(2) = huge(0) s(3) = huge(0) + 1_INT64

print," Maxint :", huge(0) do i = 1, 3 print, "size:", s(i) allocate(Is(s(i)), source=-1)

  !call move_alloc(Is, x)
  allocate(x, source =Is, STAT=err)
  print*, "err: ", err
  select type (x)      type is (integer(INT32))
     print*, size(x, kind=INT64), "sizeof(xptr)"
     address = c_loc(x(1))
     call c_f_pointer(address, g1d, shape=[s(i)])
     print*, "last element of g1d:", g1d(s(i))
  class default
    print*, "type is wrong"
  end select
  deallocate(x, Is)

enddo

end program

metdyn commented 1 year ago

Hi @weiyuan-jiang @tclune : You are both right. I was wrong.
For comp/intel/2021.6.0, the same allocation error occurs irrespective of -mcmodel=medium or large being used. There is no effect. As found by @weiyuan-jiang, there is no allocation error when using comp/intel/2021.7.0 or comp/gcc/12.1.0 with plain flag. Thank you @weiyuan-jiang for the test code, which is very illustrative!

weiyuan-jiang commented 1 year ago

I have another PR https://github.com/Goddard-Fortran-Ecosystem/gFTL/pull/189

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days, it will be closed. You can add the "long term" tag to prevent the Stale bot from closing this issue.

mathomp4 commented 1 year ago

@weiyuan-jiang @metdyn Is this bug fixed? The Stale Bot wanted to close this, but I want to make sure it actually is fixed

metdyn commented 1 year ago

Hi @mathomp4, thanks for asking. I have no objections on closing this issue.