dstack is an open-source alternative to Kubernetes, designed to simplify development, training, and deployment of AI across any cloud or on-prem. It supports NVIDIA, AMD, and TPU.
Many cloud providers bundle one or several (e.g. 16) local disks with some instance types.
Local disks have these traits:
Physically attached to the host and hence provide better performance
Included in the instance price, no way to opt out
Provided in addition to the main OS disk
Typically do not have a file system
Not persistent, the data typically survives instance restarts, but is lost when the instance stops
Storage capacity is fixed and may vary depending on the instance type
Here is how major cloud providers implement local disks:
AWS provides Instance Store that is opt-in for some instance types but always bundled with others. Capacity varies from ~60 GB to ~336 TB
Azure provides one Temporary Disk with most instance types. Capacity is not documented (?) but apparently varies from 16 GB to several terabytes
GCP provides Local SSDs that are opt-in for some instance types but always bundled with others. Capacity varies from 375 GB to 36 TB
OCI provides Local Disks with some instance types. Capacity varies from ~4TB to ~80TB
Problem
dstack ignores local disks, so dstack users cannot benefit from their performance and capacity, even though they pay for them.
Possible solutions
Solution 1 — create an LVM volume over the local disks and use it as docker's data_root
This way, users will benefit from local disks' performance automatically, no configuration or special handling is needed. However, if users request more disk capacity in the run configuration than local disks have to offer, dstack will have to store data_root on the OS disk as usual, i.e. the local disks will still be ignored.
Solution 2 — create an LVM volume over the local disks and mount it to a directory within the container
This way, users will be able to use both the fixed local disks and the configurable OS disk, i.e. have flexible disk capacity. However, the environment will be different on instances with and without local disks, so users' code will have to be adjusted to use local disks.
Context
Many cloud providers bundle one or several (e.g. 16) local disks with some instance types.
Local disks have these traits:
Here is how major cloud providers implement local disks:
Problem
dstack
ignores local disks, sodstack
users cannot benefit from their performance and capacity, even though they pay for them.Possible solutions
Solution 1 — create an LVM volume over the local disks and use it as docker's
data_root
This way, users will benefit from local disks' performance automatically, no configuration or special handling is needed. However, if users request more disk capacity in the run configuration than local disks have to offer,
dstack
will have to storedata_root
on the OS disk as usual, i.e. the local disks will still be ignored.Solution 2 — create an LVM volume over the local disks and mount it to a directory within the container
This way, users will be able to use both the fixed local disks and the configurable OS disk, i.e. have flexible disk capacity. However, the environment will be different on instances with and without local disks, so users' code will have to be adjusted to use local disks.