Closed adammoody closed 5 months ago
To run SCR with async flush:
export SCR_DEBUG=1
export SCR_CACHE_BYPASS=0
export SCR_FLUSH=1
export SCR_FLUSH_ASYNC=1
export SCR_FLUSH_TYPE=PTHREAD
This configures SCR to write checkpoints to /dev/shm
, then flush every checkpoint using async flush with pthreads.
To run:
srun -n4 ./test_api --times 10 --size 1GB --reduce 100000
This will write 10 checkpoints, where each file is 1 GB in size. It will execute 100,000 iterations of the work loop between checkpoints.
If one ensures the work loop runs long enough so that a flush can complete before test_api reaches its next checkpoint, one can configure SCR to see the cost of a work loop with and without async flushes in a single run:
export SCR_FLUSH=2
With this, SCR would flush every other checkpoint. So some compute timesteps would include a background flush and some would not. It may be easier to get an apples-to-apples comparison that way.
The worst case for interference should occur when the flush finishes just before the current work loop ends, so that the flush is running for the full duration of the timestep.
Other variations:
# sync (blocking) flush instead of async (background) flush
SCR_FLUSH_ASYNC=0
# for sync flush operatinos,
# copy data with main thread instead of spawning a pthread
SCR_FLUSH_TYPE=SYNC
# disable flush completely
SCR_FLUSH=0
We have a python script that will parse the output from SCR_DEBUG
to compute statistics, e.g., compute total, mean, and stddev of checkpoint costs. This might help when running experiments.
@adammoody We are planning on spinning a release in the next few weeks. Is this PR ready to merge?
@adammoody, I think you mentioned that u wanted to add a configuration for switching between the kernels.
@gonsie I have tested the test_api with mpi and non-mpi run with and without the kernel flag. IMO, this is ready for review.
To study and improve async flush performance in SCR, this extends
test_api.c
to execute various work kernels. By focusing on certain operations, e.g., CPU intensive, memory intensive, network intensive, etc. the idea is to identify which types of operations are most susceptible to interference from a background flush.