Currently the methods I can think of do implement this feature are:
Add a way to schedule ctt to automatically change an issue to offline the whole blade
Have ctt create a reservation for when the work should start and add the blade in question to it
Have ctt remove user facing queues from the nodes on the blade's qlist ahead of time to drain the blade
I'll think about this for a bit and then sketch out a design for the team to review
Options 1 is probably the easiest to implement, but it is my least favorite. From other discussions, I think adding a "testing" state to ctt where it is easy for hsg/csg/hpe to test nodes through pbs, but keep users off of them (ex: for node checkout after hardware work) would be beneficial, options 2 and 3 give a fairly clear path on how this could be accomplished, but option 1 doesn't.
Conceptually using reservations (Option 2) fits my mental model the best, plus it will allow back filling of short jobs while the others won't, so it is my preference. However reservations have been known to crash pbs so I'll have to do some testing to confirm if this is feasible.
Option 3 should work, however some care would have to go into handling the Qlist for nodes, as we have a tendancy to change this regularly on Casper, so ctt could cause some serious confusion if we do this wrong.
Original issue from NCAR/ctt_client: We are using scripts to offline the nodes at 8pm via cron for HPE service the next business day.
We would like to integrate this functionality directly into ctt.
The functionality and way to make it work is authors choice.