The results about multi-modal dialogue retrieval on PhotoChat in PaCE

JupiterTop commented 4 months ago

Hello, When i follow the steps of quickstart, i am confused about how to get the final results about multi-modal dialogue retrieval on PhotoChat in PaCE. I think that the evaluation script compute scores of validation dataset, am i right? And when i change the split from 'val' to 'test', it only gets lower scores than ones in the paper. Hope for your help! Best regards.

pldlgb commented 3 months ago

Hello, why do you think the evaluation script evaluates the results of the validation set? I'm a little confused. Can you provide more details~

JupiterTop commented 3 months ago

Thanks for your reply! When the script is executed, because it is PhotoChat dataset, it will use the "compute_old_irtr_recall()" method to calculate, and then it constructs text dataset:text_dset = pl_module.trainer.datamodule.dms[0].make_no_false_val_dset(). It returns a validation set data:

    def make_no_false_val_dset(self, image_only=False, image_list=None, image_dir=None):
        if image_list == None:
            return self.dataset_cls_no_false(
                self.data_dir,
                self.val_transform_keys,
                split="val",
                image_size=self.image_size,
                max_text_len=self.max_text_len,
                draw_false_image=0,
                draw_false_text=0,
                image_only=image_only,
                max_image_len=self.max_image_len,
                use_segment_ids=self.use_segment_ids,
                mask_prob = self.mask_prob,
                max_pred_len = self.max_pred_len,
                whole_word_masking = self.whole_word_masking,
                mask_source_words = self.mask_source_words,
                max_source_len = self.max_source_len
            )
            ...

Hope for your help~

pldlgb commented 3 months ago

Oh, I see. Our project is developed based on ViLT, so we used the same tricks [vilt-datamodule] [vilt-coco-dataset]as they do . If you replace photochat_context_dev with photochat_context_test in the class PhotochatDataset(BaseDataset), what result would you get?

pldlgb commented 3 months ago

Additionally, you might want to try using a longer context when constructing the data. (https://github.com/AlibabaResearch/DAMO-ConvAI/blob/adcb4950b123eb70266201cb5c0e10894658ec97/pace/pace/utils/write_photochat.py#L46)

AlibabaResearch / DAMO-ConvAI

The results about multi-modal dialogue retrieval on PhotoChat in PaCE #131