anosorae / IRRA

Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval (CVPR 2023)
MIT License
205 stars 27 forks source link

There is a mistake that the mlm module that the mask token's output and the whole token #29

Closed jssyzsfzy closed 1 year ago

jssyzsfzy commented 1 year ago

The ImageTextMLMDataset's _build_random_masked_tokens_and_labels outputs is same with the output of tokenize, that is attribute to the copy(),it require deepcopy() to get a new caption. Otherwise the dataloader of ImageTextMLMDataset's outputs 'caption_ids': caption_tokens, 'mlm_ids': mlm_tokens, The output will be same, and it is different from the figure of the pipeline of the paper. thanks