awslabs / sagemaker-battlesnake-ai

Starter pack to build an AI for Battlesnake with Amazon Sagemaker more content on wiki:
https://github.com/awslabs/sagemaker-battlesnake-ai/wiki
Apache License 2.0
89 stars 53 forks source link

BattlesnakeNotebook initialization fails on S3 bucket name inconsistency #28

Closed dangor closed 1 year ago

dangor commented 3 years ago

After stack creation, the log for BattlesnakeNotebook/LifecycleConfigOnStart includes the following line:

boto3.exceptions.S3UploadFailedError: Failed to upload RLlibEnv/inference/model.tar.gz to bonhomme-snake/battlesnake-aws/pretrainedmodels/model.tar.gz: An error occurred (NoSuchBucket) when calling the PutObject operation: The specified bucket does not exist

where bonhomme-snake was the value I chose for the parameter SolutionS3BucketName.

Looking in S3, I see that the bucket was named sagemaker-solutions-bonhomme-snake.

Looking in the CloudFormation template yaml file, I see that the bucket is created with BucketName: !Sub "sagemaker-solutions-${SolutionS3BucketName}": https://github.com/awslabs/sagemaker-battlesnake-ai/blob/master/CloudFormation/deploy-battlesnake-endpoint.yaml#L68

while the sed command doesn't have the same sagemaker-solutions- prefix and only uses the bucket name: https://github.com/awslabs/sagemaker-battlesnake-ai/blob/master/CloudFormation/deploy-battlesnake-endpoint.yaml#L243-L245

I unstuck myself by manually creating an s3 bucket without the prefix, modifying the NotebookInstanceExecutionRole's IAM permissions to include the new bucket explicitly and objects underneath it, then restarting the notebook instance.

(I did not try updating the sed commands in the script.)

phossen commented 3 years ago

Hey @dangor, I am trying to follow the described steps of your fix. As I am new to AWS I don't quiet understand where to modify the NotebookInstanceExecutionRole's IAM permissions. Could you describe that a little more detailed? Thanks in advance!

dangor commented 3 years ago

@phossen No worries, I'll try:

  1. Get to AWS Console -> IAM
  2. Navigate to Access management -> Roles
  3. In the role list, choose the one that has "NotebookInstanceExecutionRole" in its name, e.g. <prefix>-NotebookInstanceExecutionRole-<random id>
  4. Click "Edit Policy"
  5. (You can choose to go to the JSON tab to edit the JSON directly, but since you're new to AWS, it'd probably be better to stay in the default Visual Editor.)
  6. Expand the "S3" section
  7. Expand the "Resources" section
  8. Under "bucket," click "Add ARN"
  9. In the dialog, enter the bucket name, and click Add
  10. Under "object," click "Add ARN"
  11. In the dialog, enter the bucket name and then * for the object name. Then click Add
  12. Then restart the notebook instance so that the IAM policy is refreshed.

I want to note that I've abandoned this project because I found the learning curve for updating the machine learning too high for me for now. Even before that, there will be other challenges you'll run into, e.g. to deal with the APIv0 -> APIv1 migration that Battlesnake now requires.

jonomon commented 3 years ago

Hi @dangor and @phossen,

I will push a new update to repository within the next week. In this new update, I will also update the API to v1.

Thanks!

jonomon commented 3 years ago

Fixed in https://github.com/awslabs/sagemaker-battlesnake-ai/commit/789036399cfdc034d8b79c00feec134834476a9b