awslabs / scale-out-computing-on-aws

Scale-Out Computing on AWS is a solution that helps customers deploy and operate a multiuser environment for computationally intensive workflows.
https://awslabs.github.io/scale-out-computing-on-aws-documentation/
Apache License 2.0
124 stars 59 forks source link

CreateWindowsDesktop: Dryrun pass but EC2 create fails due to AZ Availability #77

Open ryanschulz46 opened 1 year ago

ryanschulz46 commented 1 year ago

Describe the bug

Certain EC2 instances are not available in some AZs. For example, as of writing, Windows g3.4xlarge is only available in ap-south-1a & ap-south-1b, but not ap-south-1c.

For CreateWindowsDesktop, the AZ is not defined for the DryRun and actual EC2 instance creation. This results in an AZ being selected by the boto and trophosphere clients, allowing for different AZs to be selected for DryRun and EC2 instance creation.

If the dry run selects an AZ with that has availability for the EC2 instance, it will pass. However, the actual EC2 instance creation could use an different AZ, one which does not doesn't have the instance.

This causes the stack to rollback, and leaves the stack in an unclean state. Retrying the CreateWindowsDesktop with the same arguments will fail as the stack already exists.

To Reproduce

Try to create a g3.4xlarge Windows in Ap-south-1. As stated above, this instance is only available in ap-south-1a & ap-south-1b, but notap-south-1c` (as of writing)

Please complete the following information about the solution:

mcrozes commented 4 months ago

Hey @ryanschulz46 -

We are in the middle of addressing this issue by migrating our compute provisioning logic to EC2 Fleet. This will be shipped as part of our future releases later this year.