alibaba / clusterdata

cluster data collected from production clusters in Alibaba for cluster management research
1.57k stars 405 forks source link

Does a Pod GPU MAX memory usage also capped by GPU Milli specified in the GPU-v2023 Trace? #207

Open matthewygf opened 8 months ago

matthewygf commented 8 months ago

A pod currently can specify:

  1. GPU count
  2. GPU Milli
  3. GPU Card-model

Does GPU Milli refers to the % of memory used for that specific card-model ? If so, how does the following translate to memory required ?

name,cpu_milli,memory_mib,num_gpu,gpu_milli,gpu_spec iopenb-pod-0021,8000,30517,1,440,G2|P100|T4|V100M16|V100M32

For the case of P100(16GB) : 440/1000*16 = 7.04GB?

For the case of V100M32(32GB): 44/1000*32 = 14.08GB?

BhAem commented 6 months ago

I have the same question. Can anybody help answer it?