Open liyier90 opened 3 weeks ago
Placement groups can conflict with the ODCR leading to an insufficient capacity exception error (ICE).
Targeted ODCR's are already mapped to a single spine in an AZ so it's redundant and causes ICE errors.
Thanks for the explanation @sean-smith . May I know if there is a distinction between ODCR and Capacity Block? Because Capacity Block is listed as a different reservation type on the console.
In the pcluster config (https://github.com/aws-samples/awsome-distributed-training/blob/main/1.architectures/2.aws-parallelcluster/distributed-training-p4de-base.yaml#L42-L43), there is a comment saying we should set
PlacementGroup
toEnabled: false
when using a targeted ODCR.May I know if this applies for a targeted "Capacity Blocks for ML"?