The ImageTextMLMDataset's _build_random_masked_tokens_and_labels outputs is same with the output of tokenize, that is attribute to the copy(),it require deepcopy() to get a new caption. Otherwise the dataloader of ImageTextMLMDataset's outputs
'caption_ids': caption_tokens, 'mlm_ids': mlm_tokens,
The output will be same, and it is different from the figure of the pipeline of the paper.
thanks
The ImageTextMLMDataset's _build_random_masked_tokens_and_labels outputs is same with the output of tokenize, that is attribute to the copy(),it require deepcopy() to get a new caption. Otherwise the dataloader of ImageTextMLMDataset's outputs
'caption_ids': caption_tokens, 'mlm_ids': mlm_tokens,
The output will be same, and it is different from the figure of the pipeline of the paper. thanks