Question on memory consumption for CRD loss when the dataset is very large

TMaysGGS commented 3 years ago

Hi,

Thank you for your great work, which helps me a lot.

I want to ask about the CRD contrast memory. In class ContrastMemory, there will be 2 buffers generated as 2 random tensor, for each is of the shape (number of data, number of features). Assume the number of features is 128, this buffer will become really huge when training with a large dataset, such as Glint 360k. Actually I tried to use CRD for my face recognition project and there are 17091657 pictures in this dataset, which leads to a outbread use for the GPU memory and there is no room for training.

I wonder if you can tell me if I am understanding this part right, and if I am right, is there any solution for this problem? Thanks.

Xinxinatg commented 3 years ago

Hey TMays I am coming accross the same issue here. Have you been able to solve it?

TMaysGGS commented 2 years ago

Hey TMays I am coming accross the same issue here. Have you been able to solve it?

Sry not yet. Since the original distillation method is useful enough, I do not add any extra loss for my training for now.

HobbitLong commented 2 years ago

Hi, @TMaysGGS , sry this is a late reply, and maybe you have figured it out. But if you are interested, there are two solutions:

you can use the Momentum Encoder trick in MoCo paper, and then you will only need a fixed-length queue.
if you can maintain a large batch size, then you can directly perform contrastive loss without memory buffer.

HobbitLong / RepDistiller

Question on memory consumption for CRD loss when the dataset is very large #40