Closed mklissa closed 6 years ago
SageMaker doesn't support SSH access to running jobs or endpoints. There are a couple of ways to get files into your instances:
There's currently no way to do remote debugging of a training job. You might be able to do this by using a customized container to run your job in local mode.
If you have another instance that you can ssh into from both the instance and your local machine, then you can tunnel through and achieve ssh access. I'm using this for the same purpose of SCPing stuff in and out.
For example, assuming "bastion" is the additional middle instance:
# run this command from within a terminal on your notebook instance (New -> Terminal), pushes port 22 to bastion's locally accessible port 10022
sh-4.2$ ssh user@bastion -R 10022:localhost:22 -f -N
# run this command from you local machine, pulls port 10022 of the bastion to local machine port 10022
[you@yourmachine]$ ssh user@bastion -L 10022:localhost:10022 -f -N
# now you can ssh or scp as you'd like, using the localhost port 10022 as the target
[you@yourmachine]$ ssh localhost -p 10022 -l ec2-user
You'll of course have to take care of authentication in the right directions (e.g. create private keys and add to authorized_keys as applicable).
This is now solved via https://github.com/aws-samples/sagemaker-ssh-helper
@mklissa I know this is quite late, but it looks like AWS has thought about your particular use case: Tutorial: Set Up PyCharm Professional with a Development Endpoint. It works via AWS Glue's ability to create developer endpoint. However, it looks like it only supports Py2.7 though.
AWS does not natively support SSH-ing into SageMaker notebook instances, but nothing really prevents you from setting up SSH yourself.
The only problem is that these instances do not get a public IP address, which means you have to either create a reverse proxy (with ngrok for example) or connect to it via bastion box.
AWS does not natively support SSH-ing into SageMaker notebook instances, but nothing really prevents you from setting up SSH yourself.
The only problem is that these instances do not get a public IP address, which means you have to either create a reverse proxy (with ngrok for example) or connect to it via bastion box.
Steps to make the ngrok solution work:
curl https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip > ngrok.zip
unzip ngrok.zip
/ngrok authenticate
with your token./ngrok tcp 22 > ngrok.log &
(& will put it in the background)~/.ssh/authorized_keys
file (on SageMaker) and paste your public key (likely ~/.ssh/id_rsa.pub
from your computer)ssh -p <port_from_ngrok_logfile> ec2-user@0.tcp.ngrok.com
(or whatever host they assign to you, it's going to be in the ngrok.log)If you want to automate it, I suggest using lifecycle configuration scripts.
Another good trick is wrapping downloading, unzipping, authenticating and starting ngrok into some binary in /usr/bin so you can just call it from SageMaker console if it dies.
It's a little bit too long to explain completely how to automate it with lifecycle scripts, but I've written a detailed guide on https://biasandvariance.com/sagemaker-ssh-setup/.
Thank you @mariokostelac! I used the most recent ngrok and needed to change two things:
./ngrok authtoken <AUTHTOKEN>
.This is now solved via https://github.com/aws-samples/sagemaker-ssh-helper
This can also be solved via https://docs.aws.amazon.com/systems-manager/latest/userguide/managed_instances.html by setting the SageMaker machine as it if where an on-prem computer that AWS SSM can manage and then one can ssh/scp/tunnel into it.
laptop> $ aws ssm start-session --region=eu-central-1 --target i-083ee1e47a95416c3
Starting session with SessionId: lgallucci-0d662d7d50462b043
ec2> $ nvidia-smi
Thu Nov 19 08:58:45 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M60 On | 00000000:00:1E.0 Off | 0 |
| N/A 34C P8 14W / 150W | 0MiB / 7618MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
This can also be solved via https://docs.aws.amazon.com/systems-manager/latest/userguide/managed_instances.html by setting the SageMaker machine as it if where an on-prem computer that AWS SSM can manage and then one can ssh/scp/tunnel into it.
laptop> $ aws ssm start-session --region=eu-central-1 --target i-083ee1e47a95416c3 Starting session with SessionId: lgallucci-0d662d7d50462b043 ec2> $ nvidia-smi Thu Nov 19 08:58:45 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla M60 On | 00000000:00:1E.0 Off | 0 | | N/A 34C P8 14W / 150W | 0MiB / 7618MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
How do I know my SageMaker Studio notebook target id?
This can also be solved via https://docs.aws.amazon.com/systems-manager/latest/userguide/managed_instances.html by setting the SageMaker machine as it if where an on-prem computer that AWS SSM can manage and then one can ssh/scp/tunnel into it.
This is great, thanks a lot for that information. I'll try to set it up soon.
@hanan-vian SM doesn't give you any target id, you have to do everything yourself as if it were some computer box in your basement (sort to say). Update: this is now solved via https://github.com/aws-samples/sagemaker-ssh-helper
@elgalu if I understand you correctly I have to start en ec2 instance with a Deep Learning-AMI?
I cannot use this together with Estimator.fit()
using the sdk
?
@philschmid we are discussing SSH access in SageMaker Studio/Notebooks in this thread. With EC2 you can already ssh, it's solved there.
I am using SM with custom Docker image, not prebuilt AMI, not notebook. I didn't find the instance id on the training job page. Did you find the instance id? @philschmid @elgalu
I tried getting instance metadata by logging into CloudWatch, but curling metadata or dynamic data (doc) didn't return response here, not even an 400 level errors. Based on this doc, there are 3 possible solutions (using session oriented IMDSv2, increasing hop limit and turning on metadata access). I will continue investigating on this.
Got reply from AWS support:
Unfortunately at this moment, it is not possible to do so. As you may already know, the EC2 instances that are spun up sits in SageMaker Service Team's account so for security purposes, SSH into the instances are not permitted. If you wish to debug your training job, I'd suggest you to use local mode. Note that local mode is not available inside SM studio because a container inside a container is unstable.
I've found not being able to SSH to notebook instances too limiting so I've built a guide to set it up by using the bastion box. https://ruslanmv.com/blog/How-to-connect-to-Sagemaker-Notebook-via-SSH I hope this can be helpful.
I know this thread is quite old, but developers keep bumping into this discussion when searching for SageMaker and SSH.
Now there's an AWS repo with sample scripts to automate the SSH setup: https://github.com/aws-samples/sagemaker-ssh-helper .
It uses managed instances capability of AWS Systems Manager (SSM), as suggested earlier by @elgalu .
As a result, the solution is secure, serverless, and supports not only connection into running jobs and endpoints with SSH/SSM, but also into SageMaker Studio containers, and allows integration with PyCharm and VSCode.
really cool @ivan-khvostishkov, that's very helpful
For SageMaker inference endpoints you could now use SSM to get shell level access to the container by enabling it from api https://docs.aws.amazon.com/sagemaker/latest/dg/ssm-access.html
I am trying to connect to a SageMaker instance through SSH with my local machine, but I cannot find a way to do it. This seems like an important functionnality, either for debugging (through PyCharm) or for uploading files with SCP. I am wondering if there is any way to do this?