index drawn by AliasMethod is not on the same gpu as the model

HobbitLong / CMC

[arXiv 2019] "Contrastive Multiview Coding", also contains implementations for MoCo and InstDis

BSD 2-Clause "Simplified" License

1.3k stars 179 forks source link

index drawn by AliasMethod is not on the same gpu as the model #58

Closed dongyaoli10x closed 4 years ago

dongyaoli10x commented 4 years ago

not sure if I missed something but it seems to me that if you train on multiple gpus with current implementation, the AliasMethod puts the index on default gpu. The memory_l and memory_ab are on the correct gpu using the register_buffer. Then the torch.index_select(self.memory_l, 0, idx.view(-1)).detach() would gives arguments are located on different GPUs error.

dongyaoli10x commented 4 years ago

ok now I figured out why. In the current implementation, only encoder is put into DataParallel. Contrast is not in DataParallel. So the loss computation happens only in one GPU. This renders the register_buffer of the memory bank useless. If put contrast into DataParallel, it won't put AliasMethod in the correct gpu. Probably the right way to go is DDP like you implemented in PyContrast