Open freemandealer opened 1 month ago
I'd like to work on it, please assign it to me
I'd like to work on it, please assign it to me
sure, thanks for you participation and welcome to Doris community.
The main idea is to scan the disk once and then scan the _files again.
In FileCacheStorage, add a virtual function unordered_set checkConsistency(BlockFileCache* _mgr, lambda handler); where handler is a lambda used to handle inconsistent AccessKeyAndOffset entries, recording any inconsistencies found. An inconsistency means that an AccessKeyAndOffset exists only in either BlockFileCache or FSFileCacheStorage.
In the implementation of checkConsistency in FSFileCacheStorage, the main task is to iterate through the fileBlock directory items under _cache_base_path, checking for their existence in _files of BlockFileCache and whether their sizes are consistent. If an entry does not exist, the handler is called; if it exists, it is recorded in an unordered_set (used for the return value).
In BlockFileCache, add a function checkConsistency, which has two main parts. The first part calls the _storage’s checkConsistency, obtaining its return value (an unordered_set that records which AccessKeyAndOffset entries have already been found during the disk scan). The second part iterates through _files, and if any item is not found in the unordered_set, it calls the handler to record this inconsistency, ultimately returning these inconsistent items.
In terms of the API, in FileCacheAction, add two types of operations. One is to input a path and check the consistency of that path, which essentially calls BlockFileCache's checkConsistency. The second is to obtain all paths (to facilitate the use of the first operation). Is it ok for the API's return to be inconsistent file names and offsets? Any suggestions regarding function naming? It is a frustrating issue.
hi @Lupinus
In FileCacheStorage, add a virtual function unordered_set checkConsistency(BlockFileCache* _mgr, lambda handler)
We don't need _mgr as its parameter since the call chain is as follows: FileCacheAction -> BlockFileCache(holding _mutex) -> specific storage
checking for their existence in _files of BlockFileCache and whether their sizes are consistent.
Additionally, please consider the consistency of their metadata. Please take a look at FileBlock::cache_type()
& FileBlock::expiration_time()
. These metadata are encoded into the directory name and file name in the filesystem and refer to FSFileCacheStorage::load_cache_info_into_memory
for details.
BTW, when using unorderd_map, the map key should include all the above-mentioned info because the file path itself could be duplicated (but with different cache types or expiration_time).
Is it ok for the API's return to be inconsistent file names and offsets?
Given that the inconsistency could be in two categories, i.e. missing in _files v.s. missing in filesystem, we should point that out along with the file path (not file name alone, should be file path) in the HTTP reponse.
Any suggestions regarding function naming? It is a frustrating issue.
No problem with the naming. And what do you mean by 'a frustrating issue'? Is it too easy or hard for you? If there is any problem with the issue itself, please help me improve it. I appreciate your help in advance.
Sorry for not expressing myself clearly, "Issue" refers to naming a function.
Search before asking
Description
Occasionally, we found that there have been cases of disk cache data escaping from the management of Doris file cache, causing disk space leaks. To make it easier for debugging, we need a checking tool that compares the contents in the Doris file cache memory management structure with the current disk contents to identify the differences between the two (which are potential problematic data).
To better understand how file cache works, please refer to: https://doris.apache.org/zh-CN/docs/dev/compute-storage-decoupled/file-cache/ and https://www.bilibili.com/video/BV1ath9eGEqL
Basic Ideas
Coz the cache is changing rapidly, we should freeze the cache (via lock) to get a snapshot of current status.
Then parse the status to get which data should be cached.
And scan the disk (also during the freeze) to see which data indeed exists.
Finally compare the above two and print the diff in logs.
Implementation Tips
We could use Restful API to trigger the check. FYI, check
be/src/http/action/file_cache_action.cpp
for more details of Restful API support in Doris.If you get in any trouble ...
Do not hesitate to contact me by WeChat 15811301868
Related issues
No response
Are you willing to submit PR?
Code of Conduct