aws / amazon-sagemaker-examples

Example 📓 Jupyter notebooks that demonstrate how to build, train, and deploy machine learning models using 🧠 Amazon SageMaker.
https://sagemaker-examples.readthedocs.io
Apache License 2.0
10.05k stars 6.75k forks source link

tensorflow_script_mode_horovod - CloudFormation template is not working #757

Open sermolin opened 5 years ago

sermolin commented 5 years ago

Hello. The last part of this notebook (Setup VPC infrastructure) does not work for me. The stack fails to create. Here is the full error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-153-54e5db02afa8> in <module>()
     42 
     43 
---> 44 subnets, security_groups = create_vpn_infra()
     45 print("Subnets: {}".format(subnets))
     46 print("Security Groups: {}".format(security_groups))

<ipython-input-153-54e5db02afa8> in create_vpn_infra(stack_name)
     25 
     26     if describe_stack["StackStatus"] != "CREATE_COMPLETE":
---> 27         raise ValueError("Stack creation failed in state: {}".format(describe_stack["StackStatus"]))
     28 
     29     print("Stack: {} created successfully with status: {}".format(stack_name, describe_stack["StackStatus"]))

ValueError: Stack creation failed in state: ROLLBACK_IN_PROGRESS

Here is my CloudFormation IAM policy attached to the role:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "cloudformation:*"
            ],
            "Resource": "*"
        }
    ]
}

I can't see any debug/log information in AWS console's view of CloudFormation stack. How do I debug? Can you provide an example of IAM CloudFormation policy that you have under which this deployment succeeded?

sermolin commented 5 years ago

IAM role needs to be updated as follows:

  1. AWS managed policy "AmazonVPCFullAccess" needs to be added.
  2. in-line custom policy needs to be added. Here is JSON for it:
    {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "cloudformation:*",
            "Resource": "*"
        }
    ]
    }
laurenyu commented 5 years ago

hi @sermolin, thanks for the heads-up with this notebook issue! Would you be interested in submitting a PR with the fixes you've found?