When a user specifies a first_device parameter, we do not use it to format the enumerated devices IFF the num_devices_per_node != 1. This can lead to some unexpected formatting of enumerated devices.
How to reproduce
Consider the Following Script
from smartsim.entity.dbobject import DBModel
def pprint_db_obj(obj):
print(f"{obj.name}: {obj.devices}")
def main():
o1 = DBModel(name="Normal Object", model=b"some-model", backend="TORCH",
device="GPU")
o2 = DBModel(name="Object w/ Device as str", model=b"some-model", backend="TORCH",
device="GPU:1")
o3 = DBModel(name="Object w/ First Device", model=b"some-model", backend="TORCH",
device="GPU", first_device=1)
o4 = DBModel(name="Object w/ Single Enumerated First Device", model=b"some-model", backend="TORCH",
device="GPU", first_device=1, devices_per_node=1)
o5 = DBModel(name="Object w/ Multi Enumerated First Device", model=b"some-model", backend="TORCH",
device="GPU", first_device=1, devices_per_node=3)
o6 = DBModel(name="Object w/ Conflicting First Device", model=b"some-model", backend="TORCH",
device="GPU:123", first_device=456)
pprint_db_obj(o1)
pprint_db_obj(o2)
pprint_db_obj(o3)
pprint_db_obj(o4)
pprint_db_obj(o5)
pprint_db_obj(o6)
print("=" * 20)
o7 = DBModel(name="Object where/ First Device should be 123", model=b"some-model", backend="TORCH",
device="GPU", first_device=123)
pprint_db_obj(o7)
print(" Yikes!! We have no pin for this GPU!!! ^^^")
return 0
if __name__ == "__main__":
raise SystemExit(main())
This is the current output:
Normal Object: ['GPU']
Object w/ Device as str: ['GPU:1']
Object w/ First Device: ['GPU']
Object w/ Single Enumerated First Device: ['GPU']
Object w/ Multi Enumerated First Device: ['GPU:1', 'GPU:2', 'GPU:3']
Object w/ Conflicting First Device: ['GPU:123']
====================
Object where/ First Device should be 123: ['GPU']
Yikes!! We have no pin for this GPU!!! ^^^
Expected behavior
This is what I would expect for output
Normal Object: ['GPU']
Object w/ Device as str: --> N/A: Constructing a DBObject where `device` != `CPU` or `GPU` (case insensitive) should raise an error <--
--> NOTE: THIS WOULD CONSTITUTE AN API BREAK <--
Object w/ First Device: ['GPU:1']
Object w/ Single Enumerated First Device: ['GPU:1']
Object w/ Multi Enumerated First Device: ['GPU:1', 'GPU:2', 'GPU:3']
Object w/ Conflicting First Device: --> N/A: Constructing a DBObject where `device` != `CPU` or `GPU` (case insensitive) should raise an error <--
--> NOTE: THIS WOULD CONSTITUTE AN API BREAK <--
====================
Object where/ First Device should be 123: ['GPU:123']
Hooray! This is what I expected!! ^^^^^^^
But of course this can be open to discussion/debate. At the very least, I think we should error if a user specifies device="GPU:123" and first_device=456.
Description
When a user specifies a
first_device
parameter, we do not use it to format the enumerated devices IFF thenum_devices_per_node != 1
. This can lead to some unexpected formatting of enumerated devices.How to reproduce
Consider the Following Script
This is the current output:
Expected behavior
This is what I would expect for output
But of course this can be open to discussion/debate. At the very least, I think we should error if a user specifies
device="GPU:123"
andfirst_device=456
.System
Any/All