upload gpu topology info to node annotation

NVIDIA / k8s-device-plugin

NVIDIA device plugin for Kubernetes

Apache License 2.0

2.79k stars 623 forks source link

upload gpu topology info to node annotation #465

Open lengrongfu opened 11 months ago

lengrongfu commented 11 months ago

1. Issue or feature description

we find current GPU select algorithm is besteffort_policy, we hope upload this node gpu topology info, when having multi gpu node, kube-schedule can select the best globally.

lengrongfu commented 11 months ago

/assign

lengrongfu commented 10 months ago

@kerthcet

kerthcet commented 10 months ago

Should this issue be put under https://github.com/NVIDIA/gpu-feature-discovery?

kerthcet commented 8 months ago

This is how we expose the GPU topo matrix now: [[-1, 20, 20, 20], [20, -1, 20, 20], [20, 20, -1, 20], [20, 20, 20, -1]], generally leverage the definations at https://github.com/NVIDIA/go-gpuallocator/blob/b0577847cf04c3e928488dfe90830a2c5a01706b/internal/links/device.go#L31-L57

cc @ArangoGutierrez @elezar @klueska Although we hope to go forward with DRA further in the future, a lot of users still stay at the old world with device plugin. I can help with this if needed. Thanks.

kerthcet commented 8 months ago

Further more, hope to expose the GPU usage for wise scheduling as well but seems NFD/GFD reports at intervals, 60s by default, not quite fit here. Any suggestions, what we do today is report via device plugin self.

elezar commented 8 months ago

Should this issue be put under https://github.com/NVIDIA/gpu-feature-discovery?

We are in the process of migrating GPU Feature discovery to this repository to streamline our releases.

elezar commented 8 months ago

Further more, hope to expose the GPU usage for wise scheduling as well but seems NFD/GFD reports at intervals, 60s by default, not quite fit here. Any suggestions, what we do today is report via device plugin self.

I don't know whether labels are the right place to expose usage information. This sounds more like something that should be made available by DCGM or another component.

I would expect labels to be relatively static due to the impact they have on decisions such as placement and scheduling.

@kerthcet when you mention exposing the topology, how do you translate this to a label? Are labels intended to encode data this way?

kerthcet commented 8 months ago

I don't know whether labels are the right place to expose usage information. This sounds more like something that should be made available by DCGM or another component.

Thanks for the advices, we're exploring reading the prometheus.

kerthcet commented 8 months ago

when you mention exposing the topology, how do you translate this to a label? Are labels intended to encode data this way?

This is how it looks like in our system right now: [[-1, 20, 20, 20], [20, -1, 20, 20], [20, 20, -1, 20], [20, 20, 20, -1]], because we use the topo for scheduling, so digital number is enough for us for scoring, but for display usage, I guess it's a different thing, maybe same as truncated nvidia-smi topo which is familiar to users. We can have a transition function internally.