apache / kvrocks

Apache Kvrocks is a distributed key value NoSQL database that uses RocksDB as storage engine and is compatible with Redis protocol.
https://kvrocks.apache.org/
Apache License 2.0
3.44k stars 442 forks source link

Memory issue of kvrocks2redis detected by ASan/TSan #2195

Open Zakelly opened 5 months ago

Zakelly commented 5 months ago

Search before asking

Motivation

Currently, the Kvrocks2redis CI fails with TSAN/ASAN or in macos env. An example for this: https://github.com/Zakelly/kvrocks/actions/runs/8344973371/job/22838786340 The value mismatch in destination redis server. More investigation is needed.

Solution

No response

Are you willing to submit a PR?

jihuayu commented 5 months ago

Thank you @Zakelly ! The action log will be deleted by GitHub after some days. Can you download the log file and upload?

Zakelly commented 5 months ago

@jihuayu Sure thing. Log attached. logs_21881009491.zip

PragmaTwice commented 4 months ago

IMHO we should spend some time on investigating this issue and find a solution before next release. cc @git-hulk

git-hulk commented 4 months ago

@PragmaTwice I may take some time to investigate this after resolving #2253, and it'd be great if other guys would like to dive deep into this issue.

PragmaTwice commented 4 months ago

@PragmaTwice I may take some time to investigate this after resolving #2253, and it'd be great if other guys would like to dive deep into this issue.

I think this issue is related to https://github.com/apache/kvrocks/issues/1988 which makes kvrocks2redis crash. It is important to keep kvrocks2redis usable, without crashing.

AntiTopQuark commented 4 months ago

@PragmaTwice I may take some time to investigate this after resolving #2253, and it'd be great if other guys would like to dive deep into this issue.

@git-hulk Hi, have you started studying this core? If not I kind of want to try it.

git-hulk commented 4 months ago

@AntiTopQuark Not yet, thank you!

AntiTopQuark commented 4 months ago

@Zakelly Hi, I've investigated this issue and successfully reproduced it on my computer.

I suspect it's not a memory issue, but rather that compiling with ASAN and TSAN significantly impacts the performance of kvrocks2redis. This leads to slower data synchronization in kvrocks2redis.

And a simple sleep(0.02)(url) in the check_consistency.py test script might not give Redis enough time to sync the corresponding results, resulting in execution errors.

As shown in the diagram below, although the test failed, the corresponding keys and values can still be found in Redis afterward. image

Or we can try to make the test script sleep longer?

AntiTopQuark commented 4 months ago

@Zakelly Hi, I've investigated this issue and successfully reproduced it on my computer.

I suspect it's not a memory issue, but rather that compiling with ASAN and TSAN significantly impacts the performance of kvrocks2redis. This leads to slower data synchronization in kvrocks2redis.

And a simple sleep(0.02)(url) in the check_consistency.py test script might not give Redis enough time to sync the corresponding results, resulting in execution errors.

As shown in the diagram below, although the test failed, the corresponding keys and values can still be found in Redis afterward. image

Or we can try to make the test script sleep longer?

cc @git-hulk

git-hulk commented 4 months ago

@AntiTopQuark Thanks for your great investigation. Can we relax the check condition? like check it N times with a fixed interval and return once it's passed. cc @PragmaTwice What do you think?

PragmaTwice commented 3 months ago

Yeah we can adjust 0.02 to see if it works.

Zakelly commented 3 months ago

Happy to see the great investigation! Thanks @AntiTopQuark