This behavior is dangerous: when querying the DistributedManager for a group size for a group name which was not created, it will first return None here:
and pass this on to dist.get_size() which will return the world group size. I think a better behavior instead would be to error out (maybe the cleanest), or to return 1 for the size and 0 for the rank.
Version
current main
On which installation method(s) does this occur?
No response
Describe the issue
This behavior is dangerous: when querying the DistributedManager for a group size for a group name which was not created, it will first return None here:
https://github.com/NVIDIA/modulus/blob/0e3da620efec3101fbda62b85b33dd862b945e09/modulus/distributed/manager.py#L131
and pass this on to dist.get_size() which will return the world group size. I think a better behavior instead would be to error out (maybe the cleanest), or to return 1 for the size and 0 for the rank.
Please let me know what you think
Minimum reproducible example
No response
Relevant log output
No response
Environment details
No response