aws-samples / amazon-ec2-nice-dcv-samples

AWS CloudFormation templates to provision Linux or Windows EC2 instances with GUI running NICE DCV remote display server. Includes option to install GPU drivers
MIT No Attribution
38 stars 4 forks source link

Getting a failed to receive signal for the ec2Instance creation #5

Closed PeregrineFalcon closed 5 months ago

PeregrineFalcon commented 5 months ago

I am using CloudFoundation and the WindowsServer-NICE_DCV.yaml to get a remote workstation for 3D visualization working.

However, CloudFoundation seems to timeout waiting for the ec2Instance to signal it is done (after the 90min wait). I am successfully seeing the Windows image startup, I can login, and run commands. The AWS cli is available.

Is there something I should keep in mind/modify to ensure the signal is sent after all of the configsets are done running?

limmike commented 5 months ago

Thanks for the report.

User-data in CloudFormation template will execute cfn-signal to indicate completion (below). You do not need to modify anything.

cfn-signal.exe -e %errorlevel% --stack ${AWS::StackId} --resource ec2Instance --region ${AWS::Region}

There could be some error or network connectivity issue that prevents cfn-signal execution.

Usually, it takes less than 20 minutes for me to provision a GPU EC2 instance. Can you delete your stack and create a new stack?

Try this first with default settings. If this works, delete it and provision one with GPU instance, i.e. specify the instanceType (g4dn? g6?) and driverType (NVIDIA-Grid?). Let me know the AWS Region and CloudFormation parameters you use if it does not work. Thanks

PeregrineFalcon commented 5 months ago

Thanks @limmike - yeah I was able to have it succeed with a simple instanceType (t3.minimal), however the second I try and do a g4dn instance type it never receives the signal.

I am running on GovCloud: us-gov-east-1 and I do have to specify an Availability Zone, so I did add to the Properties block for ec2Instance. But that worked just fine having that for the t3.minimal image. g4dn isntance types are only available on us-gov-east-1a and us-gov-east-1b. What I ended up doing is using the ec2 instance for the t3.minimal and then changed it to a g4dn, but yeah something about CloudFormation building a ec2 g4dn instance on GovCloud.

     AvailabilityZone: !Select
        - 0
        - !GetAZs
          Ref: 'AWS::Region'
limmike commented 5 months ago

Thanks @PeregrineFalcon

When you specify subnetID, it maps to a AZ . Go to VPC console at https://console.amazonaws-us-gov.com/vpcconsole/home#subnets: to get subnetID to AZ mapping.

I have updated NVIDIA GRID driver download code, but do not have access to GovCloud region to test the changes.

Can you download the latest Windows-NICE-DCV.yaml and try again?

PeregrineFalcon commented 5 months ago

The changes you made seemed to have worked. A g4dn.8xlarge was provisioned in under 10 min, was rebooted and the GRID drivers were installed as requested. Thanks for the fix!

limmike commented 5 months ago

awsome. thanks for reporting!