NearNodeFlash / NearNodeFlash.github.io

View this document https://nearnodeflash.github.io/
Apache License 2.0
3 stars 3 forks source link

WLM must honor DirectiveBreakdown constraints #59

Closed matthew-richerson closed 1 year ago

matthew-richerson commented 1 year ago

The DirectiveBreakdown status section lists the storage and compute requirements for a DW directive. The Status.Storage section gives requirements for each of the storage allocation sets. Included in each allocation set is a Status.Storage.AllocationSets.Constraints field that limits the storage that can be picked for that allocation set based on a set of rules. For example, this is the Status.Storage.AllocationSets section for a Lustre file system with a shared mgt/mdt:

    allocationSets:
    - allocationStrategy: AllocateAcrossServers
      constraints:
        labels:
        - dws.cray.hpe.com/storage=Rabbit
        scale: 5
      label: ost
      minimumCapacity: 1073741824
    - allocationStrategy: AllocateAcrossServers
      constraints:
        colocation:
        - key: lustre-mgt
          type: exclusive
        count: 1
        labels:
        - dws.cray.hpe.com/storage=Rabbit
      label: mgtmdt
      minimumCapacity: 5368709120

The two allocation sets both have a "labels" constraint that limits the storage to Storage resources with the dws.cray.hpe.com/storage=Rabbit label.

The mgtmdt allocation set has a separate colocation constraint. This constraint means that the allocation set must have exclusive use of the storage node within the set of all allocation sets that have a colocation constraint with the "lustre-mgt" key. There are no constraints on colocating the allocation set with allocation sets without a colocation constraint or with allocation sets with a colocation constraint with a different key. The end result is to prevent multiple MGTs from different file systems from being allocated on the same storage node, but to allow an MGT to be on the same storage node as other allocations (OSTs, MDTs, xfs filesystems, etc.).

Github issue 32 shows that these constraints aren't being honored since the mgt/mdt for two Lustre file systems are placed on a single Rabbit node.

jameshcorbett commented 1 year ago

I think this is more appropriate as an issue against the Flux/Rabbit integration, so I made https://github.com/flux-framework/flux-coral2/issues/70 and will close this, but feel free to reopen.