Open ccy00808 opened 1 year ago
Yes your thought makes sense. I think one way is to make the BlockMaster remove the worker from the collection when re-registering a worker. Please let us know when you have the PR out, thanks!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.
Alluxio Version: 2.7.0
Describe the bug When the worker is restarted, the metadata of the worker has not been removed from the effective collection of the master. If it is re-registered, the worker will be locked. As a result, when the monitoring obtains worker-related data and the client synchronizes the worker status, the competing lock cannot be obtained and the worker needs to wait It can be restored after registration, 500W blocks need 2~4min, 2500W blocks need 2 hours
To Reproduce worker cache block 1000w, restart worker and execute repeat "wget http://host:19999/metrics/prometheus/" to see if the completed monitoring data can be obtained
Expected behavior Data can be obtained normally
Urgency generally
Are you planning to fix it yes