NVIDIA / DCGM

NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
Apache License 2.0
373 stars 49 forks source link

python dcgm_structs.DcgmJSONEncoder does not recursively follow arrays #107

Open blackwer opened 12 months ago

blackwer commented 12 months ago

As written, the DcgmJSONEncoder only decodes arrays if they're at the root level, otherwise they're ignored and stored as generic object handles. In the type conditionals the serializer currently only checks if the nested values are _PrintableStructure to recurse, even though Array is also considered a valid type at the root level. https://github.com/NVIDIA/DCGM/blob/4e0a87a8c7a9af99e0d911c491d9a65204f1a30e/testing/python3/dcgm_structs.py#L495

This came up when serializing dcgm_agent.dcgmJobGetStats(handle.handle, str(jobId))

A basic fix might be:

#JSON serializer for DCGM structures
class DcgmJSONEncoder(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, _PrintableStructure):
            retVal = {}
            for fieldName, fieldType in o._fields_:
                subObj = getattr(o, fieldName)
                if isinstance(subObj, (_PrintableStructure, Array)):
                    subObj = self.default(subObj)

                retVal[fieldName] = subObj

            return retVal
        elif isinstance(o, Array):
            retVal = []
            for i in range(len(o)):
                subVal = {}
                if isinstance(o[i], _PrintableStructure):
                    for fieldName, fieldType in o[i]._fields_:
                        subObj = getattr(o[i], fieldName)
                        if isinstance(subObj, (_PrintableStructure, Array)):
                            subObj = self.default(subObj)

                        subVal[fieldName] = subObj

                    retVal.append(subVal)
                else:
                    retVal.append(o[i])

            return retVal

        #Let the parent class handle this/fail
        return json.JSONEncoder.default(self, o)

Happy to contribute as a PR if this is considered a bug and not a feature. Could possibly impact https://github.com/NVIDIA/DCGM/blob/4e0a87a8c7a9af99e0d911c491d9a65204f1a30e/dcgm_wsgi/dcgm_wsgi.py#L134