accel-sim / accel-sim-framework

This is the top-level repository for the Accel-Sim framework.
https://accel-sim.github.io
Other
273 stars 105 forks source link

Support for xz-compressed traces #265

Closed FJShen closed 6 months ago

FJShen commented 7 months ago

Trace files (pre-processing) can be compressed to around ~10% the raw text file size. Post-processed trace files can be compressed down to just 1%. Trading off CPU time for compression/decompression makes sense on multi-core IO-limited systems.

This commit brings support for xz-compressed trace files. All three of the tracer, the trace post-processor, and the trace-driven simulator have been upgraded to support xz-compressed traces.

export TRACE_FILE_COMPRESS=1 to let the tracer directly create xz-compressed trace files.

The commit provides backward-support to raw text format trace files. It figures out automatically whether to decompress a trace file or just consume raw text trace.

The compression and decompression are transparently delegated to a child process running xz in a bash shell, the trace data is transmitted between accel-sim and xz using UNIX anonymous pipes. The level of compression (and the compression scheme itself) are chosen to yield good compression ratio without becoming a bottlenecking factor to the tracer/post-processor/simulator's performance. xz is preferred over gzip and bzip2 for its superior compression ratio and support of multi-threading.

To upgrade (i.e. to compress) legacy trace files, use xz to compress them (e.g. xz -1 -T0 kernel-1.traceg) and modify the kernelslist.g file to change the trace file name accordingly ("kernel-1.traceg" -> "kernel-1.traceg.xz"). Pass --keep to xz to keep the original trace file if you feel cautious.

Convenience bash commands like xzgrep, xzless, xzmore are at your disposal if you need to search into or read the compressed trace.

This PR also optimizes the trace post processor's memory footprint. A WarpInstLUT or warp instruction look-up table is introduced to register recurrent string fragments of warp instructions. Warp instructions with identical fragments hold a pointer to a global copy of the string fragment, so the memory overhead of warp instructions is effectively reduced. In one real life test, a trace file that is 500GB in size (pre processing) only incurred a ~150GB memory footprint.

Situations of error and exception:

tgrogers commented 6 months ago

Hey @Connie120 can you take a look at the code this weekend? I will get Jenkins back up shortly