I tried printing this after every iteration and it returns same value for all gpus:
logger.info(sum(p.sum().item() for p in model.parameters()))
Also, I am getting this warning:
/tmp/pip-req-build-ocx5vxk7/aten/src/ATen/native/IndexingUtils.h:20: UserWarning: indexing with dtype torch.uint8 is now deprecated, please use a dtype torch.bool instead.
Comment: I downgraded to pytorch==1.0.1 and warning disappeared. Problem with speed is still present.
Overall, it looks like training with 10x RTX 2080 is only a little bit faster than with 1x RTX 2080. I am getting very similar results for another server with 5x1080. I used Horovod for parallel training and it was always fine (not linear, but close enough). Is there something, that I can try?
Hi,
See #189. I have a very similar problem with speed.
CPU
Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
MB
SuperMicro X10DRG-Q
1x RTX 2080
10x RTX 2080
I tried printing this after every iteration and it returns same value for all gpus:
Also, I am getting this warning:
Comment: I downgraded to pytorch==1.0.1 and warning disappeared. Problem with speed is still present.
Overall, it looks like training with 10x RTX 2080 is only a little bit faster than with 1x RTX 2080. I am getting very similar results for another server with 5x1080. I used Horovod for parallel training and it was always fine (not linear, but close enough). Is there something, that I can try?
Thanks.
Originally posted by @Jamiroquai88 in https://github.com/facebookresearch/XLM/issues/189#issuecomment-647415523