Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.87k stars 2.94k forks source link

Fix failed to delete an empty directory due to cache evict #18675

Open gp1314 opened 2 months ago

gp1314 commented 2 months ago

What changes are proposed in this pull request?

phenomenon

reason

  1. Write data to directory A and flush directory A to rocksdb when the cache is full
  2. File b is written to directory A,b cached in InodeCache,A->b cached in EdgeCache
  3. If the cache is full, EdgeCache flush is triggered,A->b in rocksdb edges column, evict A from mListingCache
  4. Delete b file,add into mUnflushedDeletes
  5. At this time, in CachingInodeStore.hasChildren
    • directory A cannot be retrieved from mListingCache(mListingCache.getCachedChildIds not present)
    • and b cannot be obtained from inodecache because b has been deleted(mListingCache.getDataFromBackingStore return empty)
    • we can also get A->b at rocksdb (mBackingStore.hasChildren return true)

reproduction

@Test public void evictInodeForTestHasChildren() throws Exception { MutableInodeDirectory myTestdir = MutableInodeDirectory.create(500, 0, "guptest", CreateDirectoryContext.defaults()); mStore.writeNewInode(myTestdir); mStore.mListingCache.evictForTesting(); mStore.mListingCache.evictForTesting(); mStore.mEdgeCache.flush(); mStore.mInodeCache.flush();

long id = 6000;
MutableInodeFile child =
    MutableInodeFile.create(id, 500, "child" + id, 0, CreateFileContext.defaults());
mStore.writeNewInode(child);
mStore.addChild(500, child);

mStore.mEdgeCache.flush();

mStore.removeInodeAndParentEdge(child);
assertEquals(true, mStore.hasChildren(myTestdir));

}


- Or set configuration

alluxio.master.metastore.inode.cache.max.size=1024 alluxio.master.metastore.inode.cache.evict.batch.size=100


- write subdir-i/i, delete i and subdir-i (i loop 500) 

### Why are the changes needed?

Please clarify why the changes are needed. For instance,
  1. In `CachingInodeStore.hasChildren`, can remove `mBackingStore.hasChildren(inode)`, because `mListingCache.GetDataFromBackingStore` already from rocksdb checked again

### Does this PR introduce any user facing changes?
No changes to the user-facing api
alluxio-bot commented 2 months ago

Automated checks report:

Some checks failed. Please fix the reported issues and reply alluxio-bot, check this please to re-run checks.

gp1314 commented 2 months ago

alluxio-bot, check this please

alluxio-bot commented 2 months ago

Automated checks report:

Some checks failed. Please fix the reported issues and reply alluxio-bot, check this please to re-run checks.

alluxio-bot commented 2 months ago

Automated checks report:

Some checks failed. Please fix the reported issues and reply alluxio-bot, check this please to re-run checks.

alluxio-bot commented 2 months ago

Automated checks report:

All checks passed!

gp1314 commented 2 months ago

@jiacheliu3 Could you please help core review it

jiacheliu3 commented 1 month ago

That is very interesting finding, thanks a lot @gp1314 ! The logic here is definitely complicated and let me spend some time thinking and get back to you.

Right now, I feel the "issue" is because there are many lock-free operations in the CachingInodeStore. So operations on different parts of the internal cache (inode cache, edge cache, listing cache) are not atomic. Your solution is probably correct, but let me think twice about possible alternatives.

gp1314 commented 4 days ago

@jiacheliu3 Could you please help core review it again