dstackai / dstack

dstack is an open-source alternative to Kubernetes, designed to simplify development, training, and deployment of AI across any cloud or on-prem. It supports NVIDIA, AMD, and TPU.
https://dstack.ai/docs
Mozilla Public License 2.0
1.34k stars 102 forks source link

Introduce Fleets for cluster provisioning and management #1327

Closed r4victor closed 2 months ago

r4victor commented 3 months ago

Currently, dstack provides the dstack pool add command to provision cloud instances and the dstack pool add-ssh command to add on-prem instances via ssh. This interface has several limitations. The major one is that there is no way to provision/add clusters. Managing clusters via pools is also not optimal.

The proposal is to introduce fleets for provisioning and managing instances and clusters of instances. Fleets are to be provisioned with dstack apply. Here's what fleet configuration will look like:

# cloud fleet
type: fleet
name: my-fleet
nodes: 4
placement: cluster/any
resources:
# ... all profile params
# ssh fleet
type: fleet
name: my-ssh-fleet
ssh:
  user: ubuntu
  ssh_key: ~/.ssh/key.pem
  port: 22
  network: "1.0.0.0/24"
  hosts:
    - "1.1.1.1"
    - "2.2.2.2"
    - hostname: "3.3.3.3"
      user: different-user
    - hostname: "4.4.4.4"
      ssh_key: ~/.ssh/different-key.pem
peterschmidt85 commented 3 months ago

type: instance or cluster? :)

r4victor commented 3 months ago

I'd start with instance since it's what dstack pool currently works with. We can discuss how dstack supposed to work with clusters in a separate issue: we could expand instance configuration to be able to provision multiple cluster nodes (may be as simple as specifying nodes: 8) or introducing a separate cluster configuration.

peterschmidt85 commented 3 months ago

I wouldn't rush and at least discuss this before implementing.

r4victor commented 2 months ago

The description is updated to use fleet configuration type.