Open jvstme opened 3 months ago
This issue is stale because it has been open for 30 days with no activity.
Some examples of images that don't work and their respective errors:
nvcr.io/nim/meta/llama3-8b-instruct:latest
(or any other images with a non-root user) when run on RunPod or Vast.ai - never starts, killed by provisioning timeoutprom/prometheus
- Error: Distribution not supported
fedora
- sed: can't read /root/.profile: No such file or directory
gcr.io/etcd-development/etcd:v3.4.34
- exec: "/bin/sh": stat /bin/sh: no such file or directory: unknown
bitnami/thanos
- unable to find user root: no matching entries in passwd file
This plan covers all requirements mentioned in the OP except “The image should have /bin/sh
”. Completing the plan would allow to (at least):
deb
/rpm
-based images.USER
from the Docker image, as it's already done for ENTRYPOINT
and CMD
(see JobConfigurator
), store it as JobSpec.user
.user
property to the run configurations to override the default image user. If the property is set, exclude offers from backends where we cannot override the container user (RunPod, Vast.ai).root
(if possible) to ensure that both the runner and the SSH server have sufficient permissions.Cmd.SysProcAttr.Credential.{Uid,Gid}
set according to the JobSpec.user
.USER
/user
's and root
's ~/.ssh/authorized_keys
, use USER
/user
instead of root
in the ~/.dstack/ssh/config
(ssh run_name
→ log in as a default/overridden user, ssh root@run_name
→ log in as root).urllib.urlopen
, etc.) to download the runner/SSH server and fail if none available.Statically linked OpenSSH or crypto/ssh-based Golang implementation embedded into the runner — yet to be decided.
root
, configure root
SSH access. In addition, if JobSpec.user
!= root
, configure non-root SSH access. In any case, JobSpec.user
is the default SSH user (that is, JobSpec.user
is the User
in the SSH client config generated by dstack
client).SSHAttach
, use JobSpec.user
as a job SSH user (currently, it's hardcoded root
).
Current
dstack
allows running custom Docker images by specifying them in theimage
property. However, not all images can be used. These are some of the image requirements:apt-get
oryum
/bin/sh
Proposed
Drop all image requirements and support all valid Docker images, including images built
FROM scratch
.Implementation notes
The main source of requirements seems to be the installation and configuration of the OpenSSH server. Possible solutions to dropping the requirements related to the OpenSSH server include:
dstack-runner
binary.