StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
9.08k stars 1.82k forks source link

[BugFix] fix MetdataCache concurrency issue (backport #52968) #52983

Closed mergify[bot] closed 3 days ago

mergify[bot] commented 3 days ago

Why I'm doing:

In previous PR, we introduced MetadataCache to achieve LRU strategy for rowset metadata memory manage, and when rowset been evict, then we will call this function:

void MetadataCache::_cache_value_deleter(const CacheKey& /*key*/, void* value) {
    reinterpret_cast<Rowset*>(value)->close();
}

to close rowset and release metadata memory.

But in some case, this rowset which pointer point to could has been destroy, but _cache_value_deleter doesn't know that and will still keep calling close(), and it will lead to concurrency issue.

What I'm doing:

This pull request includes several changes to the MetadataCache class and related functionality in the rowset module. The main goals are to improve memory management by using std::weak_ptr for cached rowsets and to rename the warmup_rowset method to refresh_rowset for clarity. Additionally, a new concurrency test is added to ensure thread safety.

Memory management improvements:

Method renaming for clarity:

New test for concurrency:

What type of PR is this:

Does this PR entail a change in behavior?

If yes, please specify the type of change:

Checklist:

Bugfix cherry-pick branch check:

But in some case, this rowset which pointer point to could has been destroy, but _cache_value_deleter doesn't know that and will still keep calling close(), and it will lead to concurrency issue.

What I'm doing:

This pull request includes several changes to the MetadataCache class and related functionality in the rowset module. The main goals are to improve memory management by using std::weak_ptr for cached rowsets and to rename the warmup_rowset method to refresh_rowset for clarity. Additionally, a new concurrency test is added to ensure thread safety.

Memory management improvements:

Method renaming for clarity:

New test for concurrency:

What type of PR is this:

Does this PR entail a change in behavior?

If yes, please specify the type of change:

Checklist: