NTHU-LSALAB / KubeShare

Share GPU between Pods in Kubernetes
Apache License 2.0
193 stars 42 forks source link

KubeShare

🎉🎉 Kubeshare 2.0 is now avaible, version 1.0 will be deprecated

A topology and heterogeneous resource aware scheduler for fractional GPU allocation in Kubernetes cluster
KubeShare 2.0 is designed in the way of the scheduling framework.

Note that KubeShare 1.0 is deprecated. Refer to the KubeShare 1.0 branch for the old version.

Features

Prerequisite & Limitation

Deployment

  1. Deploy Componments

Workloads

Label description

Because floating point custom device requests is forbidden by K8s, we move GPU resource usage definitions to Labels.

Pod specification

apiVersion: v1
kind: Pod
metadata:
  name: mnist
  labels:
    "sharedgpu/gpu_request": "0.5"
    "sharedgpu/gpu_limit": "1.0"
    "sharedgpu/gpu_model": "NVIDIA-GeForce-GTX-1080"
spec:
  schedulerName: kubeshare-scheduler
  restartPolicy: Never
  containers:
    - name: pytorch
      image:  riyazhu/mnist:test
      command: ["sh", "-c", "sleep infinity"]
      imagePullPolicy: Always #IfNotPresent

Job specification

apiVersion: batch/v1
kind: Job
metadata:
  name: lstm-g
  labels:
    app: lstm-g
spec:
  completions: 5
  parallelism: 5
  template:
    metadata:
      name: lstm-o
      labels:
        "sharedgpu/gpu_request": "0.5"
        "sharedgpu/gpu_limit": "1.0"
        "sharedgpu/group_name": "a"
        "sharedgpu/group_headcount": "5"
        "sharedgpu/group_threshold": "0.2"
        "sharedgpu/priority": "100"
    spec:
      schedulerName: kubeshare-scheduler
      restartPolicy: Never
      containers:
        - name: pytorch
          image:  riyazhu/lstm-wiki2:test
          # command: ["sh", "-c", "sleep infinity"]
          imagePullPolicy: IfNotPresent
          volumeMounts:
          - name: datasets
            mountPath: "/datasets/"
      volumes:
        - name: datasets
          hostPath:
            path: "/home/riya/experiment/datasets/"

Build

Compiling

git clone https://github.com/NTHU-LSALAB/KubeShare.git
cd KubeShare
make

Build & Push images

make build-image
make push-image

Directories & Files

GPU Isolation Library

Please refer to Gemini.

TODO

Issues

Any issues please open a GitHub issue, thanks.