Dora is an experiment management framework. It expresses grid searches as pure python files as part of your repo. It identifies experiments with a unique hash signature. Scale up to hundreds of experiments without losing your sanity.
MIT License
262
stars
24
forks
source link
World size by dora_distrib.world_size() is equal to 1 when I have two GPU's #47
❓ Questions
I am training my model on two NVIDIA 4090s, whenever the following code is run:
world_size = dora_distrib.world_size() print(world_size)
world_size is equal to 1 even though torch.cuda.device_count() returns 2. I tried wrapping my model in DDP and DataParallel but to no avail.
Would appreciate someone shining light on why this is happening.
Thanks