E3SM-Project / scorpio

A high-level Parallel I/O Library for structured grid applications
18 stars 16 forks source link

Potential segmentation fault in write_darray_multi_par() during ADIOS conversion of large restart files #572

Closed dqwu closed 4 months ago

dqwu commented 4 months ago

While attempting to convert a sizable eamxx restart file (exceeding 2 TB) from ADIOS BP5 format to NetCDF format on Frontier, a segmentation fault was observed: srun: error: frontier00005: task 32: Segmentation fault

[Core dump stack trace with GDB]

#0  0x000000000047a090 in write_darray_multi_par ()
#1  0x0000000000460b08 in PIOc_write_darray_multi ()
#2  0x000000000048004e in flush_buffer ()
#3  0x0000000000408466 in sync_file ()
#4  0x0000000000408f63 in PIOc_sync ()
#5  0x000000000052fab4 in adios2_ConvertVariableDarray<double> (v_base=..., bpIO=..., bpReader=..., varname=..., ncid=18, var=..., decomp_map=..., iosysid=2050, file0=..., adios=..., time_step=0, 
    comm=-1006632956, mpirank=230, nproc=512, block_procs=..., local_proc_blocks=..., block_list=..., processed_attrs=..., decomp_cache=...)
    at scream/externals/scorpio/tools/adios2pio-nm/adios2pio-nm-lib.cxx:2045
#6  0x00000000004b724c in ConvertVariableDarray (bpIO=..., bpReader=..., varname=..., ncid=18, var=..., decomp_map=..., iosysid=2050, file0=..., adios=..., time_step=0, comm=-1006632956, mpirank=230, 
    nproc=512, block_procs=..., local_proc_blocks=..., block_list=..., processed_attrs=..., decomp_cache=...)
    at scream/externals/scorpio/tools/adios2pio-nm/adios2pio-nm-lib.cxx:2124
#7  0x00000000004b8d9a in ConvertBPFile (infilepath=..., outfilename=..., pio_iotype=1, rearr=..., comm_in=1140850688)
    at scream/externals/scorpio/tools/adios2pio-nm/adios2pio-nm-lib.cxx:2417
#8  0x00000000004b9ea2 in ConvertBPToNC (infilepath=..., outfilename=..., piotype=..., rearr=..., comm_in=1140850688)
    at scream/externals/scorpio/tools/adios2pio-nm/adios2pio-nm-lib.cxx:2542
#9  0x00000000004ba24c in MConvertBPToNC (bppdir=..., piotype=..., rearr=..., comm=1140850688)
    at scream/externals/scorpio/tools/adios2pio-nm/adios2pio-nm-lib.cxx:2671
#10 0x0000000000403661 in main (argc=2, argv=0x7fffffff1418) at scream/externals/scorpio/tools/adios2pio-nm/adios2pio-nm.cxx:166

Upon further investigation, it has been identified that the issue stems from the use of stack-based pointer arrays within the write_darray_multi_par() function. Specifically, the arrays startlist and countlist need to be migrated to heap-based arrays due to their substantial size.

Here's the relevant code snippet:

int write_darray_multi_par(...)
{
    ...
    /* If this is an IO task write the data. */
    if (ios->ioproc)
    {
        ...
        PIO_Offset *startlist[num_regions]; /* Array of start arrays for ncmpi_iput_varn(). */
        PIO_Offset *countlist[num_regions]; /* Array of count  arrays for ncmpi_iput_varn(). */
        ...
    }
    ...
}

For IO tasks, num_regions can be as large as 167,772,224, leading to a considerable space requirement for the startlist and countlist arrays, approximately 2.5 GB.

Please be aware that the ADIOS conversion tool employs the SUBSET rearranger as its default setting. If the BOX rearranger is used instead, the described issue becomes non-reproducible. This is due to the fact that, when using the BOX rearranger, num_regions is always 1.

It is important to note that on Frontier, the stack size limit is set to less than 300 MB:

core file size          (blocks, -c) unlimited
...
stack size              (kbytes, -s) 300000

To resolve this, it is recommended to transition these arrays from stack-based to heap-based to accommodate the larger size requirements, preventing potential segmentation faults.

See also issue #17, PR #264, scream PR E3SM-Project/scream#2393

rljacob commented 4 months ago

Do you need to convert the restart files to netcdf? Anytime they are that large, assume you're doing bp-only restarts.

dqwu commented 4 months ago

Do you need to convert the restart files to netcdf? Anytime they are that large, assume you're doing bp-only restarts.

Actually the conversion is not required for restart runs. However, this issue still needs to be fixed for the SUBSET rearranger, which is the default rearranger used by the conversion tool.

dqwu commented 4 months ago

Variable-length array (VLA) is supported by C99 but its usage should not be abused. Normally, it should only be used for local arrays with relatively small sizes.