NVIDIA / deepops

Tools for building GPU clusters
BSD 3-Clause "New" or "Revised" License
1.25k stars 326 forks source link

Ports closed on docker startup #1256

Closed clemsgrs closed 1 year ago

clemsgrs commented 1 year ago

Running dockers no longer have open ports when started. As a result, things like tensorboard, jupyer notebooks, and remote ssh interpreter can no longer be reached.

In my case, I'm trying to remotely debug code from within a docker container via VSCode. Here's the setup:

Ultimately, I'd like be able to map a port of the host to port 22 so that I can edit my ~/.ssh/config file and access the docker container within VSCode.

NB: here's the complete srun command I'm using

srun \
  --ntasks=1 \
  --gpus=1 \
  --cpus-per-task=4 \
  --time=6-00:00:00 \
  --container-image=<docker_image> \
  --pty bash
dholt commented 1 year ago

If you're using the --container-image flag you're not using Docker, you're using Enroot/Pyxis (https://github.com/NVIDIA/enroot). Anything you run in the container will be in the same network namespace as the host, so you would want the SSH service in your container to listen on a non-privileged port that doesn't conflict with the host port 22.

clemsgrs commented 1 year ago

True, sorry for the confusion between Docker & Enroot. Thanks for your answer. It's still unclear to me what I have to do to be able to kick off an Enroot image via srun and show the exposed ports & to which ports of the node the container is running on they've been mapped to.

Ideally I'd like to display something like:

Exposed ports:
22/tcp -> http://<node_name>:32795
8888/tcp -> http://<node_name>:32796

I assume I'd have to add a few lines in the Docker file to enable this.

dholt commented 1 year ago

There's no concept of exposing ports with Enroot. If you run a service on a port in the container, it will be available on the host as well. In your case you'll need to modify the SSH server configuration in your container to run on a port other than 22, otherwise when it tries to start it will find a conflicting SSH service from the host already using port 22.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 60 days with no activity. Please update the issue or it will be closed in 7 days.