Closed dongyaoli10x closed 4 years ago
ok now I figured out why. In the current implementation, only encoder is put into DataParallel
. Contrast is not in DataParallel
. So the loss computation happens only in one GPU. This renders the register_buffer
of the memory bank useless. If put contrast into DataParallel
, it won't put AliasMethod
in the correct gpu. Probably the right way to go is DDP like you implemented in PyContrast
not sure if I missed something but it seems to me that if you train on multiple gpus with current implementation, the
AliasMethod
puts the index on default gpu. Thememory_l
andmemory_ab
are on the correct gpu using theregister_buffer
. Then thetorch.index_select(self.memory_l, 0, idx.view(-1)).detach()
would givesarguments are located on different GPUs
error.