Closed joachimweyl closed 6 months ago
@naved001 you mentioned "prometheus returns the name of the hypervisor so that’s good." it sounds like this would require having a list of all Lenovo hypervisors to confirm if they are Lenovo. Is that correct?
If the 5 estimate was too much based on the ease of getting the hypervisor name please feel free to shrink the estimate.
it sounds like this would require having a list of all Lenovo hypervisors to confirm if they are Lenovo. Is that correct?
exactly, or if the name of the hypervisor has "lenovo" (or some other identifying info) in it then we don't have to maintain a list.
Do we need/have a second issue for capturing openshift multi-instance GPU costs? https://www.redhat.com/en/blog/multi-instance-gpu-support-with-the-gpu-operator-v1.7.0?extIdCarryOver=true&sc_cid=701f2000001OH6fAAG
@msdisme https://github.com/CCI-MOC/ops-issues/issues/1039 is for testing MIG.
As we currently have no other A100s in OpenShift this issue is a lower priority.
awaiting OpenShift testing to find out the exact compute node name then we can differentiate.
It sounds like the way to differentiate is to create a list of compute nodes that are in this 1st batch of Lenovo loans and track it that way. Work to put this into use will be done in this issue.
Motivation
We need to provide invoice data to Lenovo for only their A100s so that we pay them for the time their GPUs are used. Closely related to this issue.
Completion Criteria
invoicing data has a way to track the difference between Lenovo and non-Lenovo GPUs. Or we generate a separate invoice for Lenovo that only shows their data.
Description
Completion dates
Desired - 2024-02-27 Required - 2024-04-05