Support for xz-compressed traces

Trace files (pre-processing) can be compressed to around ~10% the raw text file size. Post-processed trace files can be compressed down to just 1%. Trading off CPU time for compression/decompression makes sense on multi-core IO-limited systems.

This commit brings support for xz-compressed trace files. All three of the tracer, the trace post-processor, and the trace-driven simulator have been upgraded to support xz-compressed traces.

export TRACE_FILE_COMPRESS=1 to let the tracer directly create xz-compressed trace files.

The commit provides backward-support to raw text format trace files. It figures out automatically whether to decompress a trace file or just consume raw text trace.

The compression and decompression are transparently delegated to a child process running xz in a bash shell, the trace data is transmitted between accel-sim and xz using UNIX anonymous pipes. The level of compression (and the compression scheme itself) are chosen to yield good compression ratio without becoming a bottlenecking factor to the tracer/post-processor/simulator's performance. xz is preferred over gzip and bzip2 for its superior compression ratio and support of multi-threading.

To upgrade (i.e. to compress) legacy trace files, use xz to compress them (e.g. xz -1 -T0 kernel-1.traceg) and modify the kernelslist.g file to change the trace file name accordingly ("kernel-1.traceg" -> "kernel-1.traceg.xz"). Pass --keep to xz to keep the original trace file if you feel cautious.

Convenience bash commands like xzgrep, xzless, xzmore are at your disposal if you need to search into or read the compressed trace.

This PR also optimizes the trace post processor's memory footprint. A WarpInstLUT or warp instruction look-up table is introduced to register recurrent string fragments of warp instructions. Warp instructions with identical fragments hold a pointer to a global copy of the string fragment, so the memory overhead of warp instructions is effectively reduced. In one real life test, a trace file that is 500GB in size (pre processing) only incurred a ~150GB memory footprint.

Situations of error and exception:

Sending SIGINT to the main process when using GDB: the xz child process is configured to ignore SIGINT and keep running while the parent process is being debugged. xz is effectively blocked by IO until the parent process resumes.
The xz child process aborts due to any reason (e.g. corrupted file): Unimplemented. The parent process has not been designed to handle this situation.
The parent process is killed/aborts while the xz child process is still alive: Note that Linux does not normally kill a child process when its parent process dies. However, since the parent and child(ren) are communicating over pipes, and the parent process closes all file descriptors when it terminates, per specification of the Linux pipe, the compression process (reading from the pipe) will receive an EOF and terminate naturally, the decompression process (writing to the pipe) will receive SIGPIPE and terminate.

accel-sim / accel-sim-framework

Support for xz-compressed traces #265