Closed shaneknapp closed 1 year ago
@ryanlovett @balajialg @yuvipanda
Based on the comments, I think you need to sum the values in column 3, then do column 2
- sum of column 3
.
some quick thangs @felder and i (mostly he) figured out:
the second command we run also includes ram requested by notebooks and pause
, whatever that is. pause is transient and should be ignored imho, and the same w/notebooks.
for x in $(cat current-nodes.txt ); do echo $x >> node-container-ram-req.txt; kubectl get -A pod -l 'component!=user-placeholder' --field-selector spec.nodeName=$x -o jsonpath='{range .items[*].spec.containers[*]}{.name}{"\t"}{.resources.requests.memory}{"\n"}{end}' | egrep -v 'pause|notebook' | sort >> node-container-ram-req.txt; echo '\n' >> node-container-ram-req.txt; done
output attached: node-container-ram-req.txt
so, for all nodes (except data101, more details to come), have mem requests for the same 4 containers: fluentbit, fluentbit-gke, gke-metrics-agent and ip-masq-agent. data101 has these, plus mongodb and postgres.
i will update the PR (https://github.com/berkeley-dsep-infra/datahub/pull/4087) w/the new numbers.
this is done and merged!
Summary
we need to update
node-placeholder/values.yaml
to reflect the new deployment metric of one nodepool (generally) per course.Acceptance criteria
when we've calculated the necessary maths to reflect our current reality.
Important information
i've manually launched a server in every pool, and performed the instructions as located in
values.yaml
.first value is the node name, second is return value from
k get node $NODE -o jsonpath='{.status.allocatable.memory}'
, and third is return value fromk get -A pod -l 'component!=user-placeholder' --field-selector spec.nodeName=$NODE -o jsonpath='{.items[*].spec.containers[*].resources.requests.memory}'
. from the latter, i assume the first (large) column is the correct value to use but i really have no idea.these number look big (i compared current datahub values vs what's in the file and #wtaf. i'd rather not proceed w/setting configs until i/we have time to think on this.
for instance, adding up the two big numbers for data100 returns 739,880,028Ki, and the current entry in
values.yaml
is 48,404,844KiTasks to complete
values.yaml