eth-cscs / manta

Another CLI for Alps
https://eth-cscs.github.io/manta/
BSD 3-Clause "New" or "Revised" License
14 stars 3 forks source link

FEAT: reuse or pin nodes when moving nodes between HSM groups as a result of a upscaling or downscaling operation #78

Open Masber opened 3 months ago

Masber commented 3 months ago

Upscaling or downscaling a HSM group from a hardware description involves moving nodes between the target and the parent HSM groups.

Current implementation combines target and parent HSM groups and then best candidate is moved back to target based on best score. This does not take into consideration if best candidate was already part of target meaning the new target hsm group won't reuse the nodes originally in target group if they have the best score and instead another node from parent hsm group chosen. This implementation is good to clean or reduce fragmentation but not desired on production clusters running WLM with long jobs because the number of nodes being changed may be too big and dramatically increase the amount of time to finish the operation since nodes needs to be moved one at a time.

This ticket tries to address this by adding a new argument to apply hw cluster command by adding the --pin argument. The goal is to pin nodes in target hsm group and reuse them if their score is the maximum available. In other words, when looking for best candidate to move to target HSM group, consider nodes already in target hsm group and priotise them if can be used as best candidate.