Open joeydumont opened 5 years ago
Or the script itself could create the DHCP options set, assign it to the VPC, do its thing, re-assign the original DHCP options set, then delete the temporary set.
One idea might be to use private IPs everywhere but we need to verify if the current schedulers configuration and all supported features are compatible with that.
Indeed. I'm currently trying to testing deployment of multiple cluster configurations to see what works and what doesn't. I'm a little confused, because my most recent deployments did not even get to the second error message above. The compute instances are spun after I run srun on the master node, but the job never gets to the compute nodes. I'm not sure what I did for my first deployment that made it work...
You mentioned that you neede to verify that using private IPs could work. Could you tell where to modify the codebase so that I could test that on my end? Even better, do you have a fork with that enabled?
I tried to switch the current logic to private ips: the nfs mounts work as expected but unfortunately the schedulers don't. Some additional work is required to reconfigure the schedulers so that they can properly function with private ips.
Ok, keep me posted!
Hi, I am facing an issue as described in #1935 inside AWS EKS (in which I am trying to use EFS as a persistent volume along with my VPC having a custom DHCP options set) which seems to be somewhat related to this.
Does anyone happen to know if there has been any update on these issues? If not, any suggestions on how I can work around this (1935)? Any help would be really appreciated.
Please also let me know if I should open a Feature Request on any other EKS-specific repository that might help increase visibility.
Environment:
Related Issue
1192
Bug description and how to reproduce: I am trying to setup a parallel cluster in a sub-account inside a AWS Landing Zone managed organization. Our LZ uses an AWS Managed AD. When an account is created via the vending machine, a default DHCP options set that points to the AD DNS is assigned to the VPC created in the account. If this DHCP options is not changed to a DHCP options set with
domain-name = <region_name>.compute.internal; domain-name-servers = AmazonProvidedDNS
,pcluster create
fails at the sanity check of the auto-scaling groupHowever, if you log in to the master server and try to schedule jobs without changing the DHCP options set back to the one that points to the AD DNS, you get errors like:
I propose that a new configuration item be added to the
[vpc]
section of the configuration, something likedhcp_options_set_build
that would take a DHCP options set ID. pcluster create could temporarily assign the DHCP options set to the VPC under consideration, and replace the original DHCP options set after.This is not a super elegant solution, but it would work.