NVIDIA / aistore

AIStore: scalable storage for AI applications
https://aistore.nvidia.com
MIT License
1.23k stars 164 forks source link

OOM killed error of Operator Controller #128

Closed superleo closed 1 year ago

superleo commented 1 year ago

Problem: Hit OMM killed of operator controller on my k8s env

Cause:

  1. Controller Manager resource limit in mem:30Mi operator/config/manager/manager.yaml

    
        resources:
          limits:
            cpu: 100m
            memory: 30Mi
          requests:
            cpu: 100m
            memory: 20Mi
  2. the memory usage average on 55Mi in the cluster.

Solution: Modify the default memory limit to 100Mi

superleo commented 1 year ago

Created a pull request for it: https://github.com/NVIDIA/ais-k8s/pull/2 @alex-aizman