AlibabaResearch / DAMO-ConvAI

DAMO-ConvAI: The official repository which contains the codebase for Alibaba DAMO Conversational AI.
MIT License
1.08k stars 176 forks source link

The results about multi-modal dialogue retrieval on PhotoChat in PaCE #131

Open JupiterTop opened 4 months ago

JupiterTop commented 4 months ago

Hello, When i follow the steps of quickstart, i am confused about how to get the final results about multi-modal dialogue retrieval on PhotoChat in PaCE. I think that the evaluation script compute scores of validation dataset, am i right? And when i change the split from 'val' to 'test', it only gets lower scores than ones in the paper. Hope for your help! Best regards.

pldlgb commented 3 months ago

Hello, why do you think the evaluation script evaluates the results of the validation set? I'm a little confused. Can you provide more details~

JupiterTop commented 3 months ago

Thanks for your reply! When the script is executed, because it is PhotoChat dataset, it will use the "compute_old_irtr_recall()" method to calculate, and then it constructs text dataset:text_dset = pl_module.trainer.datamodule.dms[0].make_no_false_val_dset(). It returns a validation set data:

    def make_no_false_val_dset(self, image_only=False, image_list=None, image_dir=None):
        if image_list == None:
            return self.dataset_cls_no_false(
                mask_prob = self.mask_prob,
                max_pred_len = self.max_pred_len,
                whole_word_masking = self.whole_word_masking,
                mask_source_words = self.mask_source_words,
                max_source_len = self.max_source_len

Hope for your help~

pldlgb commented 3 months ago

Oh, I see. Our project is developed based on ViLT, so we used the same tricks [vilt-datamodule] [vilt-coco-dataset]as they do . If you replace photochat_context_dev with photochat_context_test in the class PhotochatDataset(BaseDataset), what result would you get?

pldlgb commented 3 months ago

Additionally, you might want to try using a longer context when constructing the data. (