dstackai / dstack

dstack is an open-source alternative to Kubernetes, designed to simplify development, training, and deployment of AI across any cloud or on-prem. It supports NVIDIA, AMD, and TPU.
https://dstack.ai/docs
Mozilla Public License 2.0
1.33k stars 98 forks source link

[Feature]: Show all jobs in runs UI #1723

Open jvstme opened 3 hours ago

jvstme commented 3 hours ago

Problem

The runs list page (/runs) and run details page (/projects/<project>/runs/<run>) only show details about one job from the run: backend, region, instance ID, price, etc. If the run has multiple jobs, the user cannot see details or logs of other jobs.

Showing details of a single jobs as details of the whole run can also be misleading, e.g. the price shown in the UI looks like the price of the whole run, while it is actually the price of only one job - only part of the run's price.

Solution

On the runs list page (/runs), show all jobs of each run, e.g. the way the are shown in CLI:

> dstack ps -w
 NAME         BACKEND  REGION     RESOURCES                   SPOT  PRICE    STATUS        SUBMITTED   
 single-node  aws      us-west-2  1xCPU, 2GB, 100.0GB (disk)  yes   $0.0059  provisioning  1 min ago   
 multi-node                                                                  running       5 mins ago  
   replica 0  aws      us-west-2  1xCPU, 2GB, 100.0GB (disk)  yes   $0.0059  running       5 mins ago  
   job_num 0                                                                                           
   replica 0  aws      us-west-2  1xCPU, 2GB, 100.0GB (disk)  yes   $0.0059  done          5 mins ago  
   job_num 1                                                                                           
 httpbin                                                                     running       11 mins ago 
   replica 0  aws      us-west-2  1xCPU, 2GB, 100.0GB (disk)  yes   $0.0059  running       11 mins ago 
   job_num 0                                                                                           
   replica 1  aws      us-west-2  1xCPU, 2GB, 100.0GB (disk)  yes   $0.0059  running       11 mins ago 
   job_num 0                                                                                           

On the run details page (/projects/<project>/runs/<run>), show the run details at the top and details for each job separately below.

Show logs per-job too.

Additional information

The API seems to already provide all the data, so changes are only needed in frontend.

peterschmidt85 commented 2 hours ago

Still would like to better understand the use case