aws / aws-parallelcluster

AWS ParallelCluster is an AWS supported Open Source cluster management tool to deploy and manage HPC clusters in the AWS cloud.
https://github.com/aws/aws-parallelcluster
Apache License 2.0
816 stars 309 forks source link

Pcluster CLI executing commands over the Internet #4066

Open jlanzarotti opened 2 years ago

jlanzarotti commented 2 years ago

I have a use case where the HeadNodes are not exposed to the Internet, specially port for SSH, and when running pcluster commands from a proxy in the AWS network doesn’t work, as the pcluster CLI seems to go to the public IP of the headnode

2022-05-30 03:29:04,393 - DEBUG - dcv_connect.py:142:_retry() - ssh: connect to host XXXX.118 port 22: Connection timed out

Adding a specific security group rule to allow the SSH connection from the public IP works without problems.

Is it possible that this tool, checks if it can reach the private IP of the headnode first ?

Thanks

enrico-usai commented 2 years ago

Hi @jlanzarotti The CLI is using the public IP, if available. If it is not, the CLI will fallback to the private one. This is by design because most of the users have the head node in a public subnet and the compute nodes in a private one.

You're facing one of the known limitations when creating a cluster in a private subnet.

Did you check auto-assign public ip settings of your subnet?

Before you create your cluster, verify that auto-assign public IPv4 address is disabled in the subnet to ensure that the pcluster commands have access to the cluster.

Enrico

jlanzarotti commented 2 years ago

Hi @enrico-usai thanks for your reply, I understood your explanation.

I'm auto-assigning the public ip in the subnet and also using EIP for the head node. However, SSH access is restricted only from a proxy/instance in the same VPC. This instance can reach the head node over the private IP to do SSH, but not from its EIP/Internet. That's why I'm facing the issue.

Q: does not sounds better to test the private first and then the public? IMO you should always try the private first and avoid the service going out the Internet.

For now I was able to bypass the issue by adding the EIP to the security group for SSH, but it's not ideal and I only keep this head node in the public subnet due a requirement of DCV from the Internet.

Thanks!

demartinofra commented 2 years ago

I think it makes sense to add a flag to force the usage of private IP in the CLI. I'll mark this as a feature enhancement.

nahkbce2 commented 1 year ago

@demartinofra @enrico-usai We have a related problem with our set-up. Recently for some reason our organization made this change to the IAM policy image thereby preventing creation of any EC2 instance if the flag --no-associate-public-ip-address is not passed to the ec2 run-instances command. This is causing pcluster create-cluster to fail with "You are not authorized to perform operation " error. Is there a workaround to this problem, until this enhancement is implemented?

Thanks, Arijit

enrico-usai commented 1 month ago

Hi @nahkbce2, sorry for the long delay. I just noticed your message. We implemented a feature that should resolve your issue: https://github.com/aws/aws-parallelcluster/pull/5683

You can now set AssignPublicIp: false in the configuration and this should be enough to be allowed by resource conditions.