If the amplification factor of requests (i.e., test_config::replayGeneratorConfig::ampFactor) is set too high, KVReplayGenerator stops generating the request and CacheLib continuously shows a hit ratio 0.00% in the output.
I first try to find if the requests are constantly generated and find out that KVReplayGenerator::genRequests() ends up in an infinite loop.
While the consumers (i.e., stressors) have to consume the requests from the queue of stressor (let's say 0), every consumer (i.e., KVReplayGenerator::getReq()) also ends up in an infinite loop.
Because every consumer is stuck in the infinite loop, waiting for requests from queues corresponding to stressors other than 0, the requests in the queue of stressor 0 remain full.
Using GDB and reviewing the source code, I identified three problems.
One of stressor threads keeps waiting in it.wait() in findFn defined in Cache<Allocator>::find(Key key).
One of reader threads is stuck in obtaining TimedMutex so that it cannot finish onGetComplete.
Expected behavior
Every consumer thread must not in the infinite loop at a certain point.
Screenshots
Here is a part of log messages that I printed for the purpose of debugging.
kv getReq while ... is printed out in the while statement of KVReplayGenerator::getReq().
genReq while ... is printed out in the while statement of KVReplayGenerator::genRequests()
Describe the bug
If the amplification factor of requests (i.e.,
test_config::replayGeneratorConfig::ampFactor
) is set too high, KVReplayGenerator stops generating the request and CacheLib continuously shows a hit ratio 0.00% in the output.I first try to find if the requests are constantly generated and find out that
KVReplayGenerator::genRequests()
ends up in an infinite loop.https://github.com/facebook/CacheLib/blob/75457b785f3de23903acf651e74aa537992b7345/cachelib/cachebench/workload/KVReplayGenerator.h#L513C1-L519C4
While the consumers (i.e., stressors) have to consume the requests from the queue of stressor (let's say 0), every consumer (i.e., KVReplayGenerator::getReq()) also ends up in an infinite loop.
https://github.com/facebook/CacheLib/blob/75457b785f3de23903acf651e74aa537992b7345/cachelib/cachebench/workload/KVReplayGenerator.h#L504C1-L529C2
Because every consumer is stuck in the infinite loop, waiting for requests from queues corresponding to stressors other than 0, the requests in the queue of stressor 0 remain full.
Using GDB and reviewing the source code, I identified three problems.
it.wait()
infindFn
defined inCache<Allocator>::find(Key key)
.onGetComplete
.OnGetComplete
acquires a lock, hinderingremoveFromFillMap
from acquiring the lock later. So, it seems that lock must be released after checkinghasTombStone(hk)
. https://github.com/facebook/CacheLib/blob/75457b785f3de23903acf651e74aa537992b7345/cachelib/allocator/nvmcache/NvmCache.h#L1186C1-L1186C31 https://github.com/facebook/CacheLib/blob/75457b785f3de23903acf651e74aa537992b7345/cachelib/allocator/nvmcache/NvmCache.h#L1143C1-L1144C75Please note that even if (2) is not happening, (1) still happens.
I attached the GDB stacktrace below.
To Reproduce For the fast debugging, I just return
Status::NotFound
inlookup()
function of navy cache.Here's the JSON file that I used.
Expected behavior Every consumer thread must not in the infinite loop at a certain point.
Screenshots Here is a part of log messages that I printed for the purpose of debugging.
kv getReq while ...
is printed out in the while statement ofKVReplayGenerator::getReq()
.genReq while ...
is printed out in the while statement ofKVReplayGenerator::genRequests()
Here is the output of GDB. Certain stressor thread(s) is(are) waiting.
One of reader threads seems to be waiting to obtain TimeMutex.
Desktop (please complete the following information):