LLNL / scr

SCR caches checkpoint data in storage on the compute nodes of a Linux cluster to provide a fast, scalable checkpoint / restart capability for MPI codes.
http://computing.llnl.gov/projects/scalable-checkpoint-restart-for-mpi
Other
99 stars 36 forks source link

Parameterize sleep period when polling async flush #498

Closed adammoody closed 2 years ago

adammoody commented 2 years ago

When waiting on an async flush to finish, SCR polls for some time to avoid busy spinning on the CPU, which may fight with the processes that are conducting the flush. This adds a new SCR_FLUSH_ASYNC_USLEEP parameter that allows the user to set the the sleep time. It also lowers the default value from 10 seconds to 1 millisecond.