NVIDIA / pyxis

Container plugin for Slurm Workload Manager
Apache License 2.0
287 stars 30 forks source link

Connect to Pyxis Container from VSCode #149

Closed ECMGit closed 4 days ago

ECMGit commented 2 months ago

Hi there,

I can access to compute node by ssh <username>@<ip> address from vscode, where my compute node was requested by srun -N1 -G1 -c16 --mem=128G --time=12:00:00 --container-name=<name> --container-image <file name>.sqsh --container-mounts=$WORK:/workspace/work --pty $SHELL, everything works well.

When I use VSCode ssh to the compute node, I need to use srun --overlap --jobid=<jobid> --pty bash to enter running container session in every terminal session, otherwise my vscode terminal only enter the host session. Also I want to know how can I find the python interpreter from container in vscode? Looks like /usr/bin/python in compute node are not same as the /usr/bin/python in running container.

Thanks!

flx42 commented 1 month ago

It's not something I can really help you with, I haven't tried this workflow. I'm afraid you will have to investigate yourself what VSCode is doing.

If you have a specific question about how pyxis is behaving and why, I can help with that. For instance:

Looks like /usr/bin/python in compute node are not same as the /usr/bin/python in running container.

Yes, that's normal, they don't have to match.

ECMGit commented 1 month ago

Thank you! I found enroot list -f can found the running container ID and enter the running container session in terminal. May I ask that how can I locate the python interpreter in container if my terminal session is in host? Is there any mapped path in host node file system?

itzsimpl commented 1 week ago

@ECMGit you can use VSCode tunnels; on our cluster this workflow works quite nicely (https://docs.rdc.si/FRIDA/slurm/#code-tunnel). All you need is the vscode cli, which you can download from https://update.code.visualstudio.com/latest/cli-alpine-x64/stable.

ECMGit commented 5 days ago

@ECMGit you can use VSCode tunnels; on our cluster this workflow works quite nicely (https://docs.rdc.si/FRIDA/slurm/#code-tunnel). All you need is the vscode cli, which you can download from https://update.code.visualstudio.com/latest/cli-alpine-x64/stable.

Hi @itzsimpl, Thanks for sharing your solution. I have a question: Is stunnel alias come with code_tunnel? and what is code_tunnel mean here? Right now I can set a tunnel on remote hpc login node, but looks like start a vscode server in running container when I use srun <start a job with container>--pty code_tunnel. Would you mind sharing more details of your solution?

Thank you so much!

itzsimpl commented 4 days ago

Yes, in our case stunnel is an alias and code_tunnel an additional script that sets some environment variables that we use to keep consistent naming; but in essence the trick is just to run code tunnel, for example assuming you downloaded code and you have it in the current folder, then run

$ srun -c32 --mem 64GB --gpus=1 --container-image=nvcr.io#nvidia/pytorch:23.10-py3 --container-mounts=./code:/code --job-name=test --pty /code tunnel --accept-server-license-terms
srun: job 818413 queued and waiting for resources
srun: job 818413 has been allocated resources
pyxis: importing docker image: nvcr.io#nvidia/pytorch:23.10-py3
pyxis: imported docker image: nvcr.io#nvidia/pytorch:23.10-py3
*
* Visual Studio Code Server
*
* By using the software, you agree to
* the Visual Studio Code Server License Terms (https://aka.ms/vscode-server-license) and
* the Microsoft Privacy Statement (https://privacy.microsoft.com/en-US/privacystatement).
*
? How would you like to log in to Visual Studio Code? ›
  Microsoft Account
❯ GitHub Account
...

and just follow instructions.

ECMGit commented 4 days ago

It works! Thank you so much!!!!!