E3SM-Project / scorpio

A high-level Parallel I/O Library for structured grid applications
Other
21 stars 16 forks source link

Set default Lustre striping factor and unit for Frontier #518

Closed dqwu closed 1 year ago

dqwu commented 1 year ago

OLCF's Frontier utilizes Orion, a parallel file system built on Lustre and HPE ClusterStor. To achieve optimal I/O performance, we apply recommended striping factor and unit within the SCORPIO library, similar to what we did for NERSC's Perlmutter.

Follows up PR #412 and #488

dqwu commented 1 year ago

On Frontier, E3SM developers got an ne1024 run on 2K nodes with this branch:

    "avg_wtput(MB/s)" : 25179.191303
    "avg_rtput(MB/s)" : 30690.303796
    "tot_wb(bytes)" : 2905268883619
    "tot_rb(bytes)" : 2263262182988
    "tot_wtime(s)" : 110.038495
    "tot_rtime(s)" : 70.328891
    "tot_time(s)" : 277.265147
sarats commented 1 year ago

To confirm, do you always set striping to 64 regardless of file size? Or is there some internal logic in Scorpio?

Ref: https://www.olcf.ornl.gov/wp-content/uploads/May2023_Usercall_OLCFStorage.pdf image

dqwu commented 1 year ago

To confirm, do you always set striping to 64 regardless of file size? Or is there some internal logic in Scorpio?

Ref: https://www.olcf.ornl.gov/wp-content/uploads/May2023_Usercall_OLCFStorage.pdf image

@sarats The default striping factor is 64 regardless of file size. CMake configuration output:

-- Limiting the number of Lustre OSTs used to PIO_MAX_LUSTRE_OSTS = 64 (default for OLCF Frontier)
-- Using filesystem striping unit, PIO_STRIPING_UNIT = 16777216 (default for OLCF Frontier)

For Perlmutter this default value is 72 (recommended by NERSC).

sarats commented 1 year ago

Did you run any experiments with > 64 stripes? Are there any known scenarios with 16TB+ files?

sarats commented 1 year ago

Looks like SCREAM doesn't expect to write 16TB+ files in the near future.

Did you notice any perf degradation for smaller files with a stripe count of 64?

dqwu commented 1 year ago

Did you run any experiments with > 64 stripes? Are there any known scenarios with 16TB+ files?

I have not run an experiment with 16TB+ files but the default value 64 should be a good selection so far (comparable to 72 for Perlmutter, using larger OSTs might affect other users). PS, you can always override this default value by passing "-DPIO_MAX_LUSTRE_OSTS=XXX" CMake option to SCORPIO.

dqwu commented 1 year ago

Looks like SCREAM doesn't expect to write 16TB+ files in the near future.

Did you notice any perf degradation for smaller files with a stripe count of 64?

For a benchmark F case which writes a history file of about 150 GB: striping factor = 1, striping unit = 16M: 1264.547157 MB/s striping factor = 16, striping unit = 16M: 7969.826210 MB/s striping factor = 64, striping unit = 16M: 12301.026504 MB/s striping factor = 128, striping unit = 16M: 14339.478563 MB/s