aws-samples / host-yolov8-on-sagemaker-endpoint

MIT No Attribution
35 stars 24 forks source link

Initial Cloudformation steps don't work #16

Open jeremymatt opened 8 months ago

jeremymatt commented 8 months ago

Preface: I'm an AWS noob, so apologies in advance if there's something obvious I'm missing.

The instructions state that the Cloudformation Stack can be created either using the Cloudformation "launch stack" link or by following a series of steps using the AWS CDK. I have been unable to get either of these working. Steps I've tried are detailed below.

Cloudformation "launch stack" template

  1. Click the link
  2. Select a name
  3. Click the button for "acknowledge creation of IAM resources"
  4. Click "create stack"
  5. Cloudformation starts creating the stack and then starts rolling back the changes.
  6. Select "detect root cause". This finds that "CustomS3AutoDeleteObjectsCustomResourceProviderHandler" is the likely problem, and that it fails because the nodejs12.x runtime is no longer supported:

Resource handler returned message: "The runtime parameter of nodejs12.x is no longer supported for creating or updating AWS Lambda functions. We recommend you use the new runtime (nodejs18.x) while creating or updating functions. (Service: Lambda, Status Code: 400, Request ID: d611eb01-007a-4154-9fde-b4b7201028fa)" (RequestToken: d13e5b8c-5193-4889-be3d-92d8cccf76a8, HandlerErrorCode: InvalidRequest)

  1. I inspected the YAML file and changed the runtime to nodejs18.x. Repeated steps 2, 3, & 4 with this updated YAML file. This also fails, with a likely root cause of:

    Resource handler returned message: "Error occurred while GetObject. S3 Error Code: PermanentRedirect. S3 Error Message: The bucket is in this region: us-east-1. Please use this region to retry the request (Service: Lambda, Status Code: 400, Request ID: e8e95860-729d-453a-8331-d7c3239eae38)" (RequestToken: 333a794a-9ca8-3bd0-bf94-2e387c4350c4, HandlerErrorCode: InvalidRequest)

I initially tried running Cloudformation in us-east-2 because this is the region where I have the greatest available quotas. I'm not sure what us-east-1 bucket is being referenced nor how to change the file to refer to a us-east-2 bucket instead.

  1. I changed Cloudformation to the us-east-1 region and tried re-running the updated YAML file (with nodejs18.x as the runtime) and go the following error:

The account-level service limit 'ml.m5.4xlarge for notebook instance usage' is 0 Instances, with current utilization of 0 Instances and a request delta of 1 Instances. Please use AWS Service Quotas to request an increase for this quota. If AWS Service Quotas is not available, contact AWS support to request an increase for this quota. (Service: AmazonSageMaker; Status Code: 400; Error Code: ResourceLimitExceeded; Request ID: ab79c7e5-36b7-4ea9-8cc0-bcad41e4293f; Proxy: null)

Given the byzantine AWS quota structure, it's not clear to me exactly what quota(s) I would need to try to have increased nor how many days I'd need to wait before the appropriate quota(s) are increased. I've requested an increase of "ml.m5.4xlarge for notebook instance usage" within the SageMaker section - hopefully this is the correct quota and that I don't need to have additional quotas (e.g., Sagemaker endpoint, Sagemaker training, Lambda). I don't see any other references to instance types in the YAML file, so hopefully it'll work if/when the quota is increased.

AWS CDK

The steps seem pretty straightforward, but it's not clear to me where I should be running these commands. I tried using a terminal in a SageMaker Jupyterlab space, but got "bash: cdk: command not found" when I tried to run the cdk synth command. Is there some other place I should be running these commands?

Thanks in advance for any help you can give, -J