alibaba / clusterdata

cluster data collected from production clusters in Alibaba for cluster management research
1.54k stars 402 forks source link

Why does evaluator for an inference job consume so much time in the cluster-trace-gpu-v2020? #197

Open cashey opened 10 months ago

cashey commented 10 months ago

1.as shown in the picture"evaluator" is for inference job ,and the "runtime" is giant: 1695265375378 2.in the paper(MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters),Figure 4a,the taskrun time is also begin 10s

image

inference job such as Image classification do not need 10s, so, there is no any such job in the cluster? and what is the job consume so much time ?

thank you very much!