argonne-lcf / user-guides

ALCF Systems User Documentation
https://docs.alcf.anl.gov/
20 stars 28 forks source link

Request: example of using node-local storage on Polaris #146

Open felker opened 1 year ago

felker commented 1 year ago

If someone has a more sophisticated approach to writing output files to /local/scratch and transferring to persistent project storage on Eagle or Grand besides a simple cp/mv/rsync at the end of the PBS job script, that would be a great addition to https://docs.alcf.anl.gov/polaris/queueing-and-running-jobs/example-job-scripts or somewhere else.

kevin-harms commented 1 year ago

ALCF doesn't have any software to do this. There are some ECP projects that potentially could help but I'm not sure if any of them have tested with Polaris.

felker commented 1 year ago

Even absent special software, we should add example use cases and job scripts.

NERSC doesn't seem to have such documentation either, since their node-local SSDs are limited to a few large memory nodes on Perlmutter and Cori. Obviously they have an all-flash Perlmutter Lustre $SCRATCH which we don't have, and the old/discouraged https://docs.nersc.gov/filesystems/cori-burst-buffer/ which isnt quite analogous since it isnt node-local and also supported MPI-I/O.

Do we know of any users on Polaris who currently make use of the SSDs? I asked around, and only found folks with ThetaKNL experience.