CERIT-SC / nbgrader-k8s

MIT License
5 stars 0 forks source link

nbgrader-exchange volume is never mounted #4

Open bpfrd opened 11 months ago

bpfrd commented 11 months ago

Hello,

I noticed that the exchange volume is defined but never mounted on the students' countainers. Here is the result of kubectl get pod/jupyter-student1 -o yaml

best regards

apiVersion: v1
kind: Pod
metadata:
  annotations:
    hub.jupyter.org/username: srtudent1
  labels:
    app: jupyterhub
    chart: jupyterhub-3.0.3
    component: singleuser-server
    heritage: jupyterhub
    hub.jupyter.org/network-access-hub: "true"
    hub.jupyter.org/servername: ""
    hub.jupyter.org/username: srtudent1
    release: nbgrader
  name: jupyter-srtudent1
  namespace: default
  resourceVersion: "178209"
  uid: 66c587ec-152e-44ad-b375-158719f1a57c
spec:
  containers:
  - args:
    - jupyterhub-singleuser
    - --SingleUserNotebookApp.max_body_size=6291456000
    env:
    - name: CPU_GUARANTEE
      value: "0.2"
    - name: CPU_LIMIT
      value: "1.0"
    - name: JPY_API_TOKEN
      value: 343dfa0290e64d6096fcba9bbe6b69bd
    - name: JUPYTERHUB_ACTIVITY_URL
      value: http://hub:8081/hub/api/users/srtudent1/activity
    - name: JUPYTERHUB_ADMIN_ACCESS
      value: "1"
    - name: JUPYTERHUB_API_TOKEN
      value: 343dfa0290e64d6096fcba9bbe6b69bd
    - name: JUPYTERHUB_API_URL
      value: http://hub:8081/hub/api
    - name: JUPYTERHUB_BASE_URL
      value: /
    - name: JUPYTERHUB_CLIENT_ID
      value: jupyterhub-user-srtudent1
    - name: JUPYTERHUB_DEBUG
      value: "1"
    - name: JUPYTERHUB_DEFAULT_URL
      value: /lab
    - name: JUPYTERHUB_HOST
    - name: JUPYTERHUB_OAUTH_ACCESS_SCOPES
      value: '["access:servers!server=srtudent1/", "access:servers!user=srtudent1"]'
    - name: JUPYTERHUB_OAUTH_CALLBACK_URL
      value: /user/srtudent1/oauth_callback
    - name: JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES
      value: '[]'
    - name: JUPYTERHUB_OAUTH_SCOPES
      value: '["access:servers!server=srtudent1/", "access:servers!user=srtudent1"]'
    - name: JUPYTERHUB_SERVER_NAME
    - name: JUPYTERHUB_SERVICE_PREFIX
      value: /user/srtudent1/
    - name: JUPYTERHUB_SERVICE_URL
      value: http://0.0.0.0:8888/user/srtudent1/
    - name: JUPYTERHUB_USER
      value: srtudent1
    - name: JUPYTER_IMAGE
      value: bpfrd/nbgrader-student:latest
    - name: JUPYTER_IMAGE_SPEC
      value: bpfrd/nbgrader-student:latest
    - name: MEM_GUARANTEE
      value: "2147483648"
    - name: MEM_LIMIT
      value: "2147483648"
    image: bpfrd/nbgrader-student:latest
    imagePullPolicy: IfNotPresent
    lifecycle:
      postStart:
        exec:
          command:
          - bash
          - -c
          - |
            echo -e "envs_dirs:\n  - /home/jovyan/my-conda-envs/" > /home/jovyan/.condarc;
    name: notebook
    ports:
    - containerPort: 8888
      name: notebook-port
      protocol: TCP
    resources:
      limits:
        cpu: "1"
        memory: "2147483648"
      requests:
        cpu: 200m
        memory: "2147483648"
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      runAsUser: 1000
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    **volumeMounts:
    - mountPath: /etc/jupyter/
      name: nbgrader-config-global
      readOnly: true
    - mountPath: /home/jovyan
      name: srtudent1-home-default-pv**
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: master-01.novalocal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: OnFailure
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 100
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoSchedule
    key: hub.jupyter.org/dedicated
    operator: Equal
    value: user
  - effect: NoSchedule
    key: hub.jupyter.org_dedicated
    operator: Equal
    value: user
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  **volumes:
  - configMap:
      defaultMode: 420
      name: nbgrader-config-global
    name: nbgrader-config-global
  - name: srtudent1-home-default-pv
    persistentVolumeClaim:
      claimName: srtudent1-home-default
  - name: nbgrader-exchange
    persistentVolumeClaim:
      claimName: nbgrader-exchange**
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2023-12-18T20:24:05Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2023-12-18T20:24:06Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2023-12-18T20:24:06Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2023-12-18T20:24:05Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://8c0c7fa2ce805cf2ea182b0d9fdac85498bb64a0f29a689d9d8e6cfa12fdc760
    image: docker.io/bpfrd/nbgrader-student:latest
    imageID: docker.io/bpfrd/nbgrader-student@sha256:598fb4a896dee9c5e533eaa591ddb6b8c0565b333a5178949a313ecc33c85dd4
    lastState: {}
    name: notebook
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2023-12-18T20:24:06Z"
  hostIP: 10.0.4.162
  phase: Running
  podIP: 10.42.0.168
  podIPs:
  - ip: 10.42.0.168
  qosClass: Burstable
  startTime: "2023-12-18T20:24:05Z"
bpfrd commented 11 months ago

I also get the below error while the spawner is loading:

2023-12-18T20:09:11.656075Z [Warning] 0/1 nodes are available: 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..

2023-12-18T20:11:19Z [Normal] Stopping container notebook

Event log Server requested 2023-12-18T19:28:23.407289Z [Warning] 0/1 nodes are available: 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.. 2023-12-18T19:55:28.733675Z [Warning] 0/1 nodes are available: persistentvolumeclaim "nbgrader-exchange" is being deleted. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling.. 2023-12-18T20:06:51.675011Z [Normal] Successfully assigned default/jupyter-admin to master-01.novalocal 2023-12-18T20:06:52Z [Normal] Container image "bpfrd/nbgrader-student:latest" already present on machine 2023-12-18T20:06:52Z [Normal] Created container notebook 2023-12-18T20:06:52Z [Normal] Started container notebook 2023-12-18T20:11:19Z [Normal] Stopping container notebook

I was wondering how I can manage the resource parameters if I have 16Gi memory and 4 cpus and let's say 10 students

best regards,

KrKOo commented 11 months ago

Hey @bpfrd

0/1 nodes are available: persistentvolumeclaim "nbgrader-exchange" is being deleted. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling

While you were testing something, you probably tried to delete the PVC "nbgrader-exchange". The problem is, that when you do that, you also have to delete all the jobs/pods, that are tied to that PVC. Until then the PVC itself is not going to get deleted. Then you probably tried to start a new pod, which wants to mount the PVC, that is marked for deletion. If you are just testing, delete the PVC and also all the pods, that it's used by.

I was wondering how I can manage the resource parameters if I have 16Gi memory and 4 cpus and let's say 10 students

0/1 nodes are available: 1 Insufficient memory. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod You will have to set the resources.limits (maximum resources, that could be allocated for each student) settings correctly. If you have only 10 students, then something like this should work for your setup:

resources:
  limits:
    cpu: 300m
    memory: 1Gi

Have in mind, that you also have to modify the resource limits for the "hub" and "proxy" pods. The sum of the resource limits of all your pods (10 singleuser + hub + proxy + other (if you have)) should not exceed 4CPU & 16Gi memory.