Closed dreamryx closed 2 years ago
@dreamryx looking at how this is implemented in the wrapper, we see:
// nvml.DeviceGetFieldValues()
func DeviceGetFieldValues(Device Device, Values []FieldValue) Return {
ValuesCount := len(Values)
return nvmlDeviceGetFieldValues(Device, int32(ValuesCount), &Values[0])
}
Which leads me to assume that the "addition" is happening in nvml
itself. Would you be able to repeat the test with code in C instead to confirm this?
@elezar C is OK. I set nvmlValue_st.value.uiVal to a number(eg. 8) and use it as input argument to call NVML_FI_DEV_ECC_SBE_AGG_TOTAL. The result is correct.
Hmm. Somthing strange is going on then because this is the entirety of the go code for this:
func DeviceGetFieldValues(Device Device, Values []FieldValue) Return {
ValuesCount := len(Values)
return nvmlDeviceGetFieldValues(Device, int32(ValuesCount), &Values[0])
}
func nvmlDeviceGetFieldValues(Device Device, ValuesCount int32, Values *FieldValue) Return {
cDevice, _ := *(*C.nvmlDevice_t)(unsafe.Pointer(&Device)), cgoAllocsUnknown
cValuesCount, _ := (C.int)(ValuesCount), cgoAllocsUnknown
cValues, _ := (*C.nvmlFieldValue_t)(unsafe.Pointer(Values)), cgoAllocsUnknown
__ret := C.nvmlDeviceGetFieldValues(cDevice, cValuesCount, cValues)
__v := (Return)(__ret)
return __v
}
It just translates the C call into the Go world and back. It never looks at or does anything with any of the fields.
Since FieldValue
includes a union
does it require special handling: https://groups.google.com/g/golang-nuts/c/G02z30Jncbk/m/RdPtTmllSTwJ ?
@elezar What is your question exactly? The union field in the C world is a [8]byte
in the Go world so that you can treat the 8 bytes as whatever you want similar to a union. The difference being that you have to do an explicit cast in the Go world (from the type you want to assign to a [8]byte
) instead of an uncasted assignment in the C world (to the name of the field associated with the type you want to assign).
@dreamryx Maybe it would help to see the exact Go code and the exact C code that you are running so that we can reproduce (and / or) spot an issue in the assignment being made.
@elezar @klueska Sorry, I made a mistake. Only when I set FieldId to NVML_FI_DEV_ECC_SBE_AGG_TOTAL, this bug can be seen. The C api has this bug.
@klueska @elezar 30 minutes ago, when I set FieldId to NVML_FI_DEV_NVLINK_LINK_COUNT and value.uiVal to a number(eg. 8), the result is correct(12). But if I set to NVML_FI_DEV_ECC_SBE_AGG_TOTAL, the bug shows in both go and C api.
The fact that the field ID has AGG_TOTAL
in it (i.e. aggregate total) may suggest that this is working as expected. You would need to check with the NVML team though. We only maintain the Go wrapper.
The best place to file an issue for NVML is here: https://forums.developer.nvidia.com/c/development-tools/other-tools/system-management-and-monitoring-nvml
The fact that the field ID has
AGG_TOTAL
in it (i.e. aggregate total) may suggest that this is working as expected. You would need to check with the NVML team though. We only maintain the Go wrapper.The best place to file an issue for NVML is here: https://forums.developer.nvidia.com/c/development-tools/other-tools/system-management-and-monitoring-nvml
OK. Thanks a lot!
This api does not put the target result value into the memory. Instead, it adds the value(dirty data in the memory where plan to store the result from DeviceGetFieldValues api) with the real result to return. Therefore, when we use the same memory structure to repeatedly call the api, the results obtained each time will increase linearly.
For example, this api need to use following struct:
The
Value [8]byte
is used to store and return the result value.FieldId
to 5(which means the NVML_FI_DEV_ECC_SBE_AGG_TOTAL) to get the total single bit aggregate (persistent) ECC errors.Value [8]byte
of the input argument FieldValue to int8(8) value.Value [8]byte
is 18.