Multiple Scheduler Bugs: Deployment Update Resource Allocation and GPU Utilization

Environment

Kubernetes version: v1.27.9 HAMi version: v2.3.9

Bug 1: Possible Scheduler Bug When Updating Deployment with Insufficient Resources

Encountered a scheduler bug when updating a Deployment's resource requirements beyond the available capacity in a Kubernetes cluster with heterogeneous memory and GPU resources

Steps to reproduce the issue

Pre-conditions:
- Node 1: 4GiB Memory, 1 GPU
- Node 2: 4GiB Memory, 1 GPU
- Node 3: 16GiB Memory, 2 GPUs (each GPU with 16GiB)
Create Deployment A:
- Replicas: 1
- Memory requirement: 16GiB
- GPU requirement: 2
Create Deployment B:
- Replicas: 1
- Memory requirement: 4GiB
- GPU requirement: 1
Delete Deployment A
Modify Deployment B
- Change replicas to 3
- Change memory requirement to 8GiB
- Change GPU requirement to 2
  Expected Behavior
  
  The update should fail because there is not enough memory and GPUs available in the cluster to satisfy the requirements of 3 replicas of Deployment B with the specified resources.
- Node 1: 4GiB Memory occupied by the pre-existing resources of Deployment B
- Node 2: Unchanged (idle)
- Node 3: 2 replicas of Deployment B fully occupied the memory of two GPUs
  Actual Behavior
  
  The update fails, but the node resource allocation is incorrectly reported:
- Node 1: 4GiB Memory
- Node 2: Unchanged (idle)
- Node 3: Resources are reported as 8GiB and 12GiB, which is inconsistent with the expected result of having all GPUs with full memory
  Prometheus Metrics

Bug 2: Incorrect GPU Utilization

Encountered a scheduler bug when updating a Deployment's resource requirements beyond the available capacity in a Kubernetes cluster with heterogeneous memory and GPU resources

Steps to reproduce the issue

Pre-conditions:
- Node 1: 4GiB Memory, 1 GPU (Max Utilization: 100%)
- Node 2: 4GiB Memory, 1 GPU (Max Utilization: 100%)
- Node 3: 16GiB Memory, 2 GPUs (each GPU with 16GiB Memory and Max Utilization: 100%)
Create Deployment A:
- Replicas: 1
- Memory requirement: 4GiB
- GPU requirement: 1
- GPUcores requirement: 120 (which implies a requirement of more than 100% GPU utilization if taking "100" as the maximum)
  Expected Behavior
  
  The deployment should fail to be scheduled due to the GPU utilization requirement exceeding the maximum limit of 100%.
- Node 1: Memory should remain unallocated (4GiB)
- Node 2: Memory should remain unallocated (4GiB)
- Node 3: Both GPUs should remain unallocated (16GiB + 100%, and 16GiB + 100%)
  Actual Behavior
  
  The deployment is incorrectly scheduled with the following resource allocation:
- Node 1: Unchanged (4GiB Memory idle)
- Node 2: Unchanged (4GiB Memory idle)
- Node 3: Resources are reported incorrectly:
  - First GPU: Appears as if 4GiB Memory + 100% Utilization has been allocated to Deployment A (should be no allocation)
  - Second GPU: Unallocated (16GiB Memory and 100% Utilization idle)

Hi @michael-nammi, Thanks for opening an issue! We will look into it as soon as possible.

Details

Instructions for interacting with me using comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the [gh-ci-bot](https://github.com/clusterpedia-io/gh-ci-bot) repository.

@michael-nammi It would be great if you could provide the yaml of each test process deployment, which would speed up our troubleshooting.

Here are the yaml files of the deployments:

Test for bug 1

Steps:

Create Deployment A:
- Replicas: 1
- Memory requirement: 16GiB
- GPU requirement: 2

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-a
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpu
  template:
    metadata:
      labels:
        app: gpu
    spec:
      containers:
      - name: ubuntu-container
        image: ubuntu:18.04
        command: ["bash", "-c", "sleep 86400"]
        resources:
          limits:
            nvidia.com/gpu: 2
            nvidia.com/gpumem: 16384

Create Deployment B:
- Replicas: 1
- Memory requirement: 4GiB
- GPU requirement: 1

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-b
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpu
  template:
    metadata:
      labels:
        app: gpu
    spec:
      containers:
      - name: ubuntu-container
        image: ubuntu:18.04
        command: ["bash", "-c", "sleep 86400"]
        resources:
          limits:
            nvidia.com/gpu: 1
            nvidia.com/gpumem: 4096

Delete Deployment A
- kubectl delete deployment deployment-a
Modify Deployment B
- Change replicas to 3
- Change memory requirement to 8GiB
- Change GPU requirement to 2

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-b
spec:
  replicas: 3
  selector:
    matchLabels:
      app: gpu
  template:
    metadata:
      labels:
        app: gpu
    spec:
      containers:
      - name: ubuntu-container
        image: ubuntu:18.04
        command: ["bash", "-c", "sleep 86400"]
        resources:
          limits:
            nvidia.com/gpu: 2
            nvidia.com/gpumem: 8192

Test for bug 2

Steps:

Create Deployment A:
- Replicas: 1
- Memory requirement: 4GiB
- GPU requirement: 1
- GPUcores requirement: 120

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment-a
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gpu
  template:
    metadata:
      labels:
        app: gpu
    spec:
      containers:
      - name: ubuntu-container
        image: ubuntu:18.04
        command: ["bash", "-c", "sleep 86400"]
        resources:
          limits:
            nvidia.com/gpu: 1
            nvidia.com/gpumem: 4096
            nvidia.com/gpucores: 120

Project-HAMi / HAMi