Closed gbliao closed 6 months ago
Hi, caption_idx means the label index for each pooled 3D feature. Since we remove duplicate captions in a batch, we need to record the label index to make sure the correspondence between the 3D feature and the language feature.
Thank you for your response. Can I understand that the label is the language feature?
Hi, thanks for your great work! I understand the meaning of equation (18). But I found in the code that the target of the loss function is caption_idx, is it convenient for you to explain the exact meaning of caption_idx?