Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.87k stars 2.94k forks source link

Worker report 0 size of block replica to master when highly concurrent access #18705

Open microbn opened 2 days ago

microbn commented 2 days ago

Alluxio Version: What version of Alluxio are you using? 2.9.3 Describe the bug A clear and concise description of what the bug is. in method updateBlockWriter of class UnderFileSystemBlockReader , when method mLocalBlockStore.createBlock exe succeed but mLocalBlockStore.createBlockWriter throw exception because of temp block check , will create 0 byte replica and mBlockWriter is null.
when request end, close UnderFileSystemBlockReader, skip check replica byte because of mBlockWriter is null. then to exe commit block to master, in commitBlock method of DefaultBlockMaster add this replica to block location with only log print. To Reproduce Steps to reproduce the behavior (as minimally and precisely as possible)

Expected behavior A clear and concise description of what you expected to happen. when check temp block failed,need clean temp block meta

Urgency Describe the impact and urgency of the bug. this bug will cause read data fail when hit the corrupt replica.

Are you planning to fix it Please indicate if you are already working on a PR. yes, I had fix this and publish to online cluster. Additional context Add any other context about the problem here.