-
/kind bug
**What steps did you take and what happened:**
I tried to integrate RayServe in my custom predictor of Inference service. Followed the below documentation as is
https://kserve.github.i…
-
MLX is a new ML framework specifically designed to run on Apple silicon: https://github.com/ml-explore/mlx
It has some differences compare to PyTorch with `mps` backend: https://github.com/ml-explo…
-
![image](https://github.com/user-attachments/assets/ad4383c2-4cd5-40a9-8c3a-921268553e42)
If the gpu and nic are on the same PCIe bridge or their topology distance is at least `PHB`, then communica…
-
AKS/Kubernetes moved Nvidia GPU resources from being an ‘alpha’ resource to a stable release, and changed the name of the resource on the cluster. Instead of requesting ‘alpha.kubernetes.io/nvidia-gpu…
-
europlots provider v0.4.6 (`akash18ga02jzaq8cw52anyhzkwta5wygufgu6zsz6xc`)
RPC node is 0.26.1 (we have tried different RPC nodes too)
```
I[2023-09-28|09:42:54.027] order detected …
-
#### Summary
GPU addon is not available on ARM Microk8s
#### What Should Happen Instead?
Be able to install GPU add-on
#### Reproduction Steps
Ubuntu amd64
```bash
$ uname -p
x86_64
$ m…
-
Allocatable gpu values not correct after configuring time slicing
```
apiVersion: v1
kind: ConfigMap
metadata:
name: time-slicing-config
data:
any: |-
version: v1
flags:
…
-
### What is the version?
3.4.2
### What happened?
Labels like `namespace` and `pod` are sometimes missing from metrics that should contain them, like `DCGM_FI_DEV_FB_USED`
### What did you expec…
-
**What happened**:
After pulling a large (22GB) image for deep learning training / evaluation onto a GPU (Nvidia T4) node in AKS, the node stops any communication with the cluster. This means the…
-
### Description
**Observed Behavior**:
1. Start with zero NVIDIA GPU nodes in the cluster.
2. Configure a node pool to automatically provision GPU nodes on request. (See config below)
3. Launch …