Closed liu115 closed 3 years ago
Hi @liu115 - Thank you for your question. You are totally right, the negative keys are sampled across the entire mini-batch (more accurately, only the per GPU mini-batch for distributed training, since we are not doing all_gather operations across all GPUs when computing the loss).
Unfortunately this is not clearly noted in the paper when we are describing the formulation. However, we recently did additional experiments and tried pre-training with negatives sampled only from the scene that contains the positive. The results are very similar, at least on the S3DIS semantic segmentation task we tested.
Let me know if you have further questions.
Thank you for the clarification.
Thank you for the great work. I really love it.
I have questions about the PointInfoNCE implementation. Is the negative samples in PointInfoNCE across batch dimension?
The paper defined PointInfoNCE
I wonder the all the pair in the matched pair set P here is the two views in the same scene or bigger than that. When I read the paper and pseudo code, I assume the batch dimension is ignored. For a point in x^1, the negative keys are all from the x^2. The P is down sample to 4096 point pairs for each mini-batch.
However, in the implementation
ddp_trainer.PointNCELossTrainer
, it seems like the negative keys are across the batch dimension. The negative samples for a point in x^1 may come from points in other scene. Am I correct?