LLNL / scr

SCR caches checkpoint data in storage on the compute nodes of a Linux cluster to provide a fast, scalable checkpoint / restart capability for MPI codes.
http://computing.llnl.gov/projects/scalable-checkpoint-restart-for-mpi
Other
99 stars 36 forks source link

simulate different work kernels in test_api #537

Closed adammoody closed 5 months ago

adammoody commented 1 year ago

To study and improve async flush performance in SCR, this extends test_api.c to execute various work kernels. By focusing on certain operations, e.g., CPU intensive, memory intensive, network intensive, etc. the idea is to identify which types of operations are most susceptible to interference from a background flush.

adammoody commented 1 year ago

To run SCR with async flush:

export SCR_DEBUG=1
export SCR_CACHE_BYPASS=0
export SCR_FLUSH=1
export SCR_FLUSH_ASYNC=1
export SCR_FLUSH_TYPE=PTHREAD

This configures SCR to write checkpoints to /dev/shm, then flush every checkpoint using async flush with pthreads.

To run:

srun -n4 ./test_api --times 10 --size 1GB --reduce 100000

This will write 10 checkpoints, where each file is 1 GB in size. It will execute 100,000 iterations of the work loop between checkpoints.

If one ensures the work loop runs long enough so that a flush can complete before test_api reaches its next checkpoint, one can configure SCR to see the cost of a work loop with and without async flushes in a single run:

export SCR_FLUSH=2

With this, SCR would flush every other checkpoint. So some compute timesteps would include a background flush and some would not. It may be easier to get an apples-to-apples comparison that way.

The worst case for interference should occur when the flush finishes just before the current work loop ends, so that the flush is running for the full duration of the timestep.

Other variations:

# sync (blocking) flush instead of async (background) flush
SCR_FLUSH_ASYNC=0

# for sync flush operatinos,
# copy data with main thread instead of spawning a pthread
SCR_FLUSH_TYPE=SYNC

# disable flush completely
SCR_FLUSH=0

We have a python script that will parse the output from SCR_DEBUG to compute statistics, e.g., compute total, mean, and stddev of checkpoint costs. This might help when running experiments.

gonsie commented 5 months ago

@adammoody We are planning on spinning a release in the next few weeks. Is this PR ready to merge?

hariharan-devarajan commented 5 months ago

@adammoody, I think you mentioned that u wanted to add a configuration for switching between the kernels.

hariharan-devarajan commented 5 months ago

@gonsie I have tested the test_api with mpi and non-mpi run with and without the kernel flag. IMO, this is ready for review.