Open eeholmes opened 6 months ago
- display_name: NVIDIA Tesla T4, 28 GB, 4 CPUs
description: "Start a container on a dedicated node with a GPU"
slug: "gpu"
profile_options:
image:
display_name: Image
choices:
pytorch:
display_name: Pangeo PyTorch ML Notebook
default: true
slug: "pytorch"
kubespawner_override:
image: "quay.io/pangeo/pytorch-notebook:2023.09.19"
kubespawner_override:
environment:
NVIDIA_DRIVER_CAPABILITIES: compute,utility
mem_limit: null
mem_guarantee: 14G
node_selector:
node.kubernetes.io/instance-type: Standard_NC4as_T4_v3
Notes on Pangeo Deep-Learning
https://medium.com/pangeo/deep-learning-with-gpus-on-pangeo-9466e25bfd74
Scott et al debugging set up on AWS https://github.com/pangeo-data/pangeo-cloud-federation/issues/490
https://learn.microsoft.com/en-us/azure/aks/gpu-cluster?tabs=add-ubuntu-gpu-node-pool
1 GPU; 16 Gig RAM AWS: g4dn.xlarge $385/mo GCP: n1-standard-4, nvidia-tesla-t4 attached to n1 family Azure: Standard_NC4as_T4_v3 $383/mo
https://www.earthdata.nasa.gov/esds/competitive-programs/access/pangeo-ml
https://hub.docker.com/r/pangeo/ml-notebook/tags
Instructions https://z2jh.jupyter.org/en/latest/jupyterhub/customizing/user-resources.html#set-user-gpu-guarantees-limits