Closed PeregrineFalcon closed 5 months ago
Thanks for the report.
User-data in CloudFormation template will execute cfn-signal to indicate completion (below). You do not need to modify anything.
cfn-signal.exe -e %errorlevel% --stack ${AWS::StackId} --resource ec2Instance --region ${AWS::Region}
There could be some error or network connectivity issue that prevents cfn-signal
execution.
Usually, it takes less than 20 minutes for me to provision a GPU EC2 instance. Can you delete your stack and create a new stack?
Try this first with default settings. If this works, delete it and provision one with GPU instance, i.e. specify the instanceType (g4dn? g6?) and driverType (NVIDIA-Grid?). Let me know the AWS Region and CloudFormation parameters you use if it does not work. Thanks
Thanks @limmike - yeah I was able to have it succeed with a simple instanceType (t3.minimal), however the second I try and do a g4dn instance type it never receives the signal.
I am running on GovCloud: us-gov-east-1
and I do have to specify an Availability Zone, so I did add to the Properties block for ec2Instance. But that worked just fine having that for the t3.minimal image. g4dn isntance types are only available on us-gov-east-1a
and us-gov-east-1b
. What I ended up doing is using the ec2 instance for the t3.minimal and then changed it to a g4dn, but yeah something about CloudFormation building a ec2 g4dn instance on GovCloud.
AvailabilityZone: !Select
- 0
- !GetAZs
Ref: 'AWS::Region'
Thanks @PeregrineFalcon
When you specify subnetID
, it maps to a AZ . Go to VPC console at https://console.amazonaws-us-gov.com/vpcconsole/home#subnets: to get subnetID to AZ mapping.
I have updated NVIDIA GRID driver download code, but do not have access to GovCloud region to test the changes.
Can you download the latest Windows-NICE-DCV.yaml and try again?
The changes you made seemed to have worked. A g4dn.8xlarge was provisioned in under 10 min, was rebooted and the GRID drivers were installed as requested. Thanks for the fix!
awsome. thanks for reporting!
I am using CloudFoundation and the WindowsServer-NICE_DCV.yaml to get a remote workstation for 3D visualization working.
However, CloudFoundation seems to timeout waiting for the ec2Instance to signal it is done (after the 90min wait). I am successfully seeing the Windows image startup, I can login, and run commands. The AWS cli is available.
Is there something I should keep in mind/modify to ensure the signal is sent after all of the configsets are done running?