LLNL / scr

SCR caches checkpoint data in storage on the compute nodes of a Linux cluster to provide a fast, scalable checkpoint / restart capability for MPI codes.
http://computing.llnl.gov/projects/scalable-checkpoint-restart-for-mpi
Other
99 stars 36 forks source link

SCR compression #205

Open tonyhutter opened 4 years ago

tonyhutter commented 4 years ago

We should consider adding compression in SCR. We mention wanting to do it in https://computing.llnl.gov/projects/scalable-checkpoint-restart-for-mpi and in src/scr_io.c:

/* TODO: could perhaps use O_DIRECT here as an optimization */
/* TODO: could apply compression/decompression here */
/* copy src_file (full path) to dest_path and return new full path in dest_file */
int scr_file_copy(
  const char* src_file,
  const char* dst_file,
  unsigned long buf_size,
  uLong* crc)
{
...

I can see cases where it would be beneficial, and cases where it wouldn't. If we did it, I'd recommend we use zstandard (https://github.com/facebook/zstd) which is currently the best compressor/decompressor out there.

adammoody commented 4 years ago

We have had research efforts looking at compression in the past (both lossless and lossy). For lossless compression, I think we got about 10-20% savings in size. Most of the data in these applications are floating point, which is difficult to compress. Lossy compression can do much better, but for that, one has to work with the application developers to figure out how much loss is tolerable.

There were some old compression functions in SCR that I had experimented with once. In case that's helpful later: https://github.com/LLNL/scr/blob/legacy/src/scr_compress.c

A little off topic, but related... for users who are willing to let us compress their files, they might also be willing to let us combine their many files into fewer files. We had something like that in the old SCR called "containers". This basically appended data from MPI ranks back-to-back into large, fixed-size container files. For that, you have to do some math to determine where each rank needs to write its data in those container files: https://github.com/LLNL/scr/blob/68414920bf40f85afce8c88c9b042ba30a928f49/src/scr_flush.c#L394 https://github.com/LLNL/scr/blob/68414920bf40f85afce8c88c9b042ba30a928f49/src/scr_flush.c#L649 https://github.com/LLNL/scr/blob/68414920bf40f85afce8c88c9b042ba30a928f49/src/scr_flush_sync.c#L100

That info was maintained in the SCR metadata for the dataset, and we used that info to read the files back out: https://github.com/LLNL/scr/blob/68414920bf40f85afce8c88c9b042ba30a928f49/src/scr_fetch.c#L141

This data compression and file aggregation would be very useful for the memory-based interface of VeloC.