Data-ScienceHub / mlcommons-science

Apache License 2.0
3 stars 1 forks source link

Cloudmesh-GPU not working on Rivanna #50

Closed rknuu closed 2 years ago

rknuu commented 2 years ago
rknuu commented 2 years ago

Found the issue with cloudmesh-gpu: it doesn't appear to handle multiple gpu systems. When dropping extra parameters from the smi(output="json") call, the attribute loop assumes that result is a dictionary and not a list of dictionaries. I'm working in some logic to handle that scenario now.

laszewsk commented 2 years ago

maybe we can also add a def __str__() that prints it out in a nice format. The Printer.write can handle lists and dicts but assumes pretty much flat structure. I have also a flatdict feature in cloudmesh, so that iv dicts are nested we can simply apply flatdict and it creates . separated keys, then we can just use Printer.write.

laszewsk commented 2 years ago

I noticed i ran into this before, so i even have a Printer.flatwrite so we may give that a try ...

laszewsk commented 2 years ago

I merged our codes. Due to a conflict, please can you confirm. e.g. i added a strip to the vendor.

Also I think we want to always return an array, even if we only have one gpu. I had no time to look into what you implemented. This way it is consistent ... We want to add a property

gpu.count

so we find out how many gpus are in the system

laszewsk commented 2 years ago

I added counter command and api templet. PLease can you complete and set the number in gpu.count

laszewsk commented 2 years ago

I implemented a maybe not so elegant way, but it works on one processor machine, can you check on rivanna