facebookresearch / dora

Dora is an experiment management framework. It expresses grid searches as pure python files as part of your repo. It identifies experiments with a unique hash signature. Scale up to hundreds of experiments without losing your sanity.
MIT License
262 stars 24 forks source link

World size by dora_distrib.world_size() is equal to 1 when I have two GPU's #47

Closed temismink closed 1 year ago

temismink commented 1 year ago

❓ Questions

I am training my model on two NVIDIA 4090s, whenever the following code is run:

world_size = dora_distrib.world_size() print(world_size)

world_size is equal to 1 even though torch.cuda.device_count() returns 2. I tried wrapping my model in DDP and DataParallel but to no avail.

Would appreciate someone shining light on why this is happening.

Thanks

adefossez commented 1 year ago

You are running with dora run -d ?

temismink commented 1 year ago

You are running with dora run -d ?

@adefossez All working now, thank you!