berkeley-dsep-infra / datahub

JupyterHubs for use by Berkeley enrolled students
https://docs.datahub.berkeley.edu
BSD 3-Clause "New" or "Revised" License
63 stars 39 forks source link

update `node-placeholder/values.yaml` for new nodepool deployment #4082

Closed shaneknapp closed 1 year ago

shaneknapp commented 1 year ago

Summary

we need to update node-placeholder/values.yaml to reflect the new deployment metric of one nodepool (generally) per course.

Acceptance criteria

when we've calculated the necessary maths to reflect our current reality.

Important information

i've manually launched a server in every pool, and performed the instructions as located in values.yaml.

first value is the node name, second is return value from k get node $NODE -o jsonpath='{.status.allocatable.memory}', and third is return value from k get -A pod -l 'component!=user-placeholder' --field-selector spec.nodeName=$NODE -o jsonpath='{.items[*].spec.containers[*].resources.requests.memory}'. from the latter, i assume the first (large) column is the correct value to use but i really have no idea.

gke-fall-2019-user-a11y-2023-01-06-4cec9591-tdxm,60055600Ki,536870912 100Mi 100Mi 60Mi 16Mi
gke-fall-2019-user-astro-2023-01-05-e6debc6b-4cqj,60055600Ki,1073741824 100Mi 100Mi 60Mi 16Mi
gke-fall-2019-user-biology-2023-01-04-9422e177-hz9z,60055600Ki,2155872256 100Mi 100Mi 60Mi 16Mi
gke-fall-2019-user-cee-2023-01-05-69684dde-f5pc,60055600Ki,2147483648 100Mi 100Mi 60Mi 16Mi
gke-fall-2019-user-data100-2022-10-28-e990d1fe-v6ml,203009116Ki,536870912 536870912 2147483648 100Mi 100Mi 60Mi 16Mi 48404844Ki 48404844Ki
gke-fall-2019-user-data101-2023-01-05-f0fdeb0e-bd4k,60055600Ki,536870912 64Mi 64Mi 100Mi 100Mi 60Mi 16Mi
gke-fall-2019-user-data102-2023-01-05-e02d4850-rfpq,60055600Ki,1073741824 100Mi 100Mi 60Mi 16Mi
gke-fall-2019-user-data8-2023-01-04-0aaf65df-cvhr,60055600Ki,536870912 536870912 100Mi 100Mi 60Mi 16Mi
gke-fall-2019-user-datahub-2023-01-04-fc70ea5b-6d6h,60055600Ki,536870912 536870912 536870912 536870912 536870912 536870912 536870912 536870912 536870912 536870912 100Mi 100Mi 60Mi 16Mi
gke-fall-2019-user-dlab-2023-01-05-cc638605-slkl,60055600Ki,4294967296 100Mi 100Mi 60Mi 16Mi
gke-fall-2019-user-eecs-2023-01-05-25fdd7ca-zb66,60055600Ki,536870912 100Mi 100Mi 60Mi 16Mi
gke-fall-2019-user-ischool-2023-01-05-762e2f24-cl28,60055600Ki,1073741824 1073741824 1073741824 1073741824 1073741824 1073741824 1073741824 100Mi 100Mi 60Mi 16Mi
gke-fall-2019-user-publichealth-2023--552e7538-2sxq,60055592Ki,100Mi 100Mi 60Mi 16Mi 536870912 536870912 536870912 536870912
gke-fall-2019-user-stat159-2023-01-05-eaf61ad8-v4wk,60055600Ki,100Mi 100Mi 60Mi 16Mi 8589934592
gke-fall-2019-user-stat20-2023-01-05-9cfe3aa4-jz8f,60055600Ki,100Mi 100Mi 60Mi 16Mi 1073741824
gke-fall-2019-user-r-2023-01-04-6e6a14cd-ndlr,60055600Ki,100Mi 100Mi 60Mi 16Mi 536870912
gke-fall-2019-user-small-courses-2023-f5d5ad27-v25p,60055600Ki,536870912 100Mi 100Mi 60Mi 16Mi

these number look big (i compared current datahub values vs what's in the file and #wtaf. i'd rather not proceed w/setting configs until i/we have time to think on this.

for instance, adding up the two big numbers for data100 returns 739,880,028Ki, and the current entry in values.yaml is 48,404,844Ki

Tasks to complete

shaneknapp commented 1 year ago

@ryanlovett @balajialg @yuvipanda

ryanlovett commented 1 year ago

Based on the comments, I think you need to sum the values in column 3, then do column 2 - sum of column 3.

shaneknapp commented 1 year ago

some quick thangs @felder and i (mostly he) figured out:

the second command we run also includes ram requested by notebooks and pause, whatever that is. pause is transient and should be ignored imho, and the same w/notebooks. for x in $(cat current-nodes.txt ); do echo $x >> node-container-ram-req.txt; kubectl get -A pod -l 'component!=user-placeholder' --field-selector spec.nodeName=$x -o jsonpath='{range .items[*].spec.containers[*]}{.name}{"\t"}{.resources.requests.memory}{"\n"}{end}' | egrep -v 'pause|notebook' | sort >> node-container-ram-req.txt; echo '\n' >> node-container-ram-req.txt; done

output attached: node-container-ram-req.txt

so, for all nodes (except data101, more details to come), have mem requests for the same 4 containers: fluentbit, fluentbit-gke, gke-metrics-agent and ip-masq-agent. data101 has these, plus mongodb and postgres.

i will update the PR (https://github.com/berkeley-dsep-infra/datahub/pull/4087) w/the new numbers.

shaneknapp commented 1 year ago

this is done and merged!