Closed byrnedj closed 1 week ago
@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@byrnedj: tested it out internally and verified 10x throughput improvement. The binary trace currently does not repeat if the specified operations are longer than the trace lenght, is this intended?
@byrnedj has updated the pull request. You must reimport the pull request before landing.
I just added that functionality to the latest version.
Thanks let me re-import again.
@therealgymmy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@therealgymmy merged this pull request in facebook/CacheLib@253107481b6cff7e5d70fc54fce075bca2c463dc.
This is the binary trace replayer/generator that we have been using to achieve max CPU utilization for the kvcache traces in cachebench. With this generator, we can achieve a throughput of over 20 million op/sec using kvcache workload in cachebench. As a comparison, using the CSV replay generator we see only ~1.6 million op/sec due to dynamic allocations and parsing overhead.
We avoid allocations by mmap'ing the request data into memory and using a Request pointer to point to the request data rather than allocating a new request wrapper for each request.
To generate a binary request file from an existing kvcache trace (using the "replay" generator).
traceFileNames
ortraceFileName
option. Specify other properties such as ampFactor too.binaryFileName: "mybinaryfile.bin
as a config optionTo run a binary request trace specify the following:
traceFileName: "mybinaryfile.bin"
and set ampSizeFactor (if desired)In summary - this patch offers much lower overhead of trace replaying. It does assumes the kvcache trace format and kvcache replay generator behavior. Additional features:
The limitations are: