berkeley-dsep-infra / data100-19s

1 stars 3 forks source link

Observation: CPU Bound #100

Open simon-mo opened 5 years ago

simon-mo commented 5 years ago

It's near the end of the semester here for DS100 and we are teaching more compute intensive applications, it seems our cluster is currently cpu bound:

NAME                        CPU(cores)   CPU%      MEMORY(bytes)   MEMORY%
aks-nodepool1-24491010-0    16057m       101%      18131Mi         15%
aks-nodepool1-24491010-1    15914m       100%      29434Mi         24%
aks-nodepool1-24491010-10   15972m       100%      11196Mi         9%
aks-nodepool1-24491010-11   16003m       100%      11785Mi         9%
aks-nodepool1-24491010-2    15841m       99%       23929Mi         20%
aks-nodepool1-24491010-3    16024m       100%      23517Mi         19%
aks-nodepool1-24491010-4    1509m        9%        22892Mi         19%
aks-nodepool1-24491010-5    15833m       99%       14209Mi         11%
aks-nodepool1-24491010-6    15985m       100%      17133Mi         14%
aks-nodepool1-24491010-7    15756m       99%       16861Mi         14%
aks-nodepool1-24491010-8    16016m       100%      10850Mi         9%
aks-nodepool1-24491010-9    15718m       98%       12764Mi         10%

Maybe for the next iteration we can increase the number of cores per student?

Thanks for the awesome work!!!

ryanlovett commented 5 years ago

Hi Simon,

We currently don't limit the cores and assume jobs are memory bound. We'd have to put in a cpu core reservation and increase the number of nodes to get adequate resources. Do you need this for this semester? If so, how many cores per student?

simon-mo commented 5 years ago

We don't need to limit the core because the memory request already limit the maximum number of students to put on each node. If it is possible, we can change the node type to have more cores. I believe AKS is experimenting with node pool, so we can change the node type near the end of term to adjust to the workload changes.