Closed cfregly closed 4 months ago
Check the CloudWatch logs. This might be an issue with the lifecycle configs, but could be a capacity issue, quota issue, wrong AZ, wrong subnet, or similar.
If the CloudWatch logs do not provide enough information, please notify SageMaker HyperPod support and we'll help you debug the issue further.
"FailureMessage": "Instance i-XXX failed to provision with the following error: \"Lifecycle scripts did not run successfully. Ensure the scripts exist in provided S3 path, are accessible, and run without errors. Please see CloudWatch logs for lifecycle script execution details.\" Note that multiple instances may be impacted."