Closed ruiwen-zhao closed 3 years ago
You can find more information about XID errors from here https://docs.nvidia.com/deploy/xid-errors/index.html#topic_4.
You can register for multiple events for a given device using the |
operator
Yes, the event data is the xid code for that event https://docs.nvidia.com/deploy/nvml-api/structnvmlEventData__t.html#structnvmlEventData__t
Thanks @guptaNswati! I guess my question is more of, if I call registerEventForDevice with event type nvmlEventTypeXidCriticalError
, will I get an event in case of any Xid errors, or just some of them? If it is just some of them, then what Xid errors are considered an XidCriticalError?
Yes, any xid error will trigger this event.
@guptaNswati Thanks for the clarification!
Hi,
I have some questions about the relations between NVML's event type, event data, and the Xid error codes. I am posting them here to see if someone might have the answers.
nvmlEventTypeXidCriticalError
, which is 8, we will be listening to Xid codes 8, 9, 11, 12, 13, and 24-31, etc?If so, then why event type nvmlEventTypeDoubleBitEccError (0x0000000000000002LL) does not cover Xid 48? And If not, if there any doc showing what Xid errors are covered by each event type?