Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.87k stars 2.94k forks source link

Cache lost data:worker not report increase block to standby master, after HA , client not access already cache replica #18706

Open microbn opened 2 days ago

microbn commented 2 days ago

Alluxio Version: What version of Alluxio are you using? 2.9.3 Describe the bug A clear and concise description of what the bug is.

  1. when WorkerRegisterToAllMasters enable, worker not report block to standby master, because of commitBlockToMaster is not empty method. Leader Master is not send block location in journal because blockMeta is created when commitBlock. this block meta is create when file meta create and write to block store.

  2. when block report, standby master will wait 1 sec when not found block meta, this lost block replica when apply journal delay.

To Reproduce Steps to reproduce the behavior (as minimally and precisely as possible) enable WorkerRegisterToAllMasters , write data and HA, check file location.

Expected behavior A clear and concise description of what you expected to happen. when HA , cache data must not lost data.

Urgency Describe the impact and urgency of the bug. after HA,client not found block location and time consuming of read request become longer

Are you planning to fix it Please indicate if you are already working on a PR. YES,already fix and prepare to upgrade. Additional context Add any other context about the problem here.