Closed dqwu closed 1 year ago
On Frontier, E3SM developers got an ne1024 run on 2K nodes with this branch:
"avg_wtput(MB/s)" : 25179.191303
"avg_rtput(MB/s)" : 30690.303796
"tot_wb(bytes)" : 2905268883619
"tot_rb(bytes)" : 2263262182988
"tot_wtime(s)" : 110.038495
"tot_rtime(s)" : 70.328891
"tot_time(s)" : 277.265147
To confirm, do you always set striping to 64 regardless of file size? Or is there some internal logic in Scorpio?
Ref: https://www.olcf.ornl.gov/wp-content/uploads/May2023_Usercall_OLCFStorage.pdf
To confirm, do you always set striping to 64 regardless of file size? Or is there some internal logic in Scorpio?
Ref: https://www.olcf.ornl.gov/wp-content/uploads/May2023_Usercall_OLCFStorage.pdf
@sarats The default striping factor is 64 regardless of file size. CMake configuration output:
-- Limiting the number of Lustre OSTs used to PIO_MAX_LUSTRE_OSTS = 64 (default for OLCF Frontier)
-- Using filesystem striping unit, PIO_STRIPING_UNIT = 16777216 (default for OLCF Frontier)
For Perlmutter this default value is 72 (recommended by NERSC).
Did you run any experiments with > 64 stripes? Are there any known scenarios with 16TB+ files?
Looks like SCREAM doesn't expect to write 16TB+ files in the near future.
Did you notice any perf degradation for smaller files with a stripe count of 64?
Did you run any experiments with > 64 stripes? Are there any known scenarios with 16TB+ files?
I have not run an experiment with 16TB+ files but the default value 64 should be a good selection so far (comparable to 72 for Perlmutter, using larger OSTs might affect other users). PS, you can always override this default value by passing "-DPIO_MAX_LUSTRE_OSTS=XXX" CMake option to SCORPIO.
Looks like SCREAM doesn't expect to write 16TB+ files in the near future.
Did you notice any perf degradation for smaller files with a stripe count of 64?
For a benchmark F case which writes a history file of about 150 GB: striping factor = 1, striping unit = 16M: 1264.547157 MB/s striping factor = 16, striping unit = 16M: 7969.826210 MB/s striping factor = 64, striping unit = 16M: 12301.026504 MB/s striping factor = 128, striping unit = 16M: 14339.478563 MB/s
OLCF's Frontier utilizes Orion, a parallel file system built on Lustre and HPE ClusterStor. To achieve optimal I/O performance, we apply recommended striping factor and unit within the SCORPIO library, similar to what we did for NERSC's Perlmutter.
Follows up PR #412 and #488