Project-HAMi / HAMi

Heterogeneous AI Computing Virtualization Middleware
http://project-hami.io
Apache License 2.0
1.04k stars 206 forks source link

nvidia-vgpu-webhook是否应该忽略kube-system NS #83

Closed panpan0000 closed 10 months ago

panpan0000 commented 1 year ago

咱们会注册一个MutatingWebhookConfiguration, 它的应用范围是是黑名单机制(只有NS加了4pd.io/webhook:ignore的label才能幸免)

边界问题: 就是比如calico-kube-controller重启,就死锁了。 calico-kube-controller的rs卡在failed calling webhook "vgpu.4pd.io": failed to call webhook. 但是因为calico网络不通,webhook肯定调不到, 死锁了。

lengrongfu commented 11 months ago
image

或许我们这里应该使用柔性策略,failurePolicy=Ignore,保证不影响其他功能。 @archlitchi 你怎么看.