LLNL / lmt

Lustre Monitoring Tools
GNU General Public License v2.0
67 stars 21 forks source link

Include progress in WAITING_FOR_CLIENTS status #65

Open behlendorf opened 9 months ago

behlendorf commented 9 months ago

When the MDT is starting and in the WAITING_FOR_CLIENTS state the server is most likely processing the MGS config logs. This is done sequentially and involves setting up a connection to each of the MDT+OST servers. Normally this is fairly quick, however if there are down servers, or the connection otherwise fails, then the full timeout needs to be waited before proceeding on to setup the next connection. When there are a large number of unreachable servers this can take a very long time.

It would be helpful if some form of progress could be reported after the WAITING_FOR_CLIENTS line for each MDT. One option might be to add the number of records in the MGS mount log being process and the current record to the recovery_status file.

0000     server1 WAITING_FOR_CLIENTS setup 45/209 connections

Example recovery_status file for MDT0000

status: WAITING_FOR_CLIENTS
mount_progress: 45/209
recovery_start: 1702343662