Binary request generator/replayer

byrnedj commented 4 months ago

This is the binary trace replayer/generator that we have been using to achieve max CPU utilization for the kvcache traces in cachebench. With this generator, we can achieve a throughput of over 20 million op/sec using kvcache workload in cachebench. As a comparison, using the CSV replay generator we see only ~1.6 million op/sec due to dynamic allocations and parsing overhead.

We avoid allocations by mmap'ing the request data into memory and using a Request pointer to point to the request data rather than allocating a new request wrapper for each request.

To generate a binary request file from an existing kvcache trace (using the "replay" generator).

Specify the kvcache trace name using the regular traceFileNames or traceFileName option. Specify other properties such as ampFactor too.
In the replayGeneratorConfig, specify binaryFileName: "mybinaryfile.bin as a config option
Run cachebench and wait for the binary file to be generated

To run a binary request trace specify the following:

Set generator to "binary-replay"
Set traceFileName: "mybinaryfile.bin" and set ampSizeFactor (if desired)

In summary - this patch offers much lower overhead of trace replaying. It does assumes the kvcache trace format and kvcache replay generator behavior. Additional features:

fast forwarding of a trace
preloading requests into memory
object size amplification
queue free for even lower request overhead

The limitations are:

no trace amplification (however you can amplify the original .csv trace and save it in binary format)
~4GB overhead per 100 million requests
you need some disk space to store large traces

facebook-github-bot commented 2 months ago

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

therealgymmy commented 1 month ago

@byrnedj: tested it out internally and verified 10x throughput improvement. The binary trace currently does not repeat if the specified operations are longer than the trace lenght, is this intended?

facebook-github-bot commented 1 month ago

@byrnedj has updated the pull request. You must reimport the pull request before landing.

byrnedj commented 1 month ago

I just added that functionality to the latest version.

therealgymmy commented 1 month ago

Thanks let me re-import again.

facebook-github-bot commented 1 month ago

@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 1 week ago

@therealgymmy merged this pull request in facebook/CacheLib@253107481b6cff7e5d70fc54fce075bca2c463dc.

facebook / CacheLib

Binary request generator/replayer #307