Open JupiterTop opened 4 months ago
Hello, why do you think the evaluation script evaluates the results of the validation set? I'm a little confused. Can you provide more details~
Thanks for your reply! When the script is executed, because it is PhotoChat dataset, it will use the "compute_old_irtr_recall()" method to calculate, and then it constructs text dataset:text_dset = pl_module.trainer.datamodule.dms[0].make_no_false_val_dset()
. It returns a validation set data:
def make_no_false_val_dset(self, image_only=False, image_list=None, image_dir=None):
if image_list == None:
return self.dataset_cls_no_false(
self.data_dir,
self.val_transform_keys,
split="val",
image_size=self.image_size,
max_text_len=self.max_text_len,
draw_false_image=0,
draw_false_text=0,
image_only=image_only,
max_image_len=self.max_image_len,
use_segment_ids=self.use_segment_ids,
mask_prob = self.mask_prob,
max_pred_len = self.max_pred_len,
whole_word_masking = self.whole_word_masking,
mask_source_words = self.mask_source_words,
max_source_len = self.max_source_len
)
...
Hope for your help~
Oh, I see. Our project is developed based on ViLT, so we used the same tricks [vilt-datamodule] [vilt-coco-dataset]as they do . If you replace photochat_context_dev with photochat_context_test in the class PhotochatDataset(BaseDataset), what result would you get?
Additionally, you might want to try using a longer context when constructing the data. (https://github.com/AlibabaResearch/DAMO-ConvAI/blob/adcb4950b123eb70266201cb5c0e10894658ec97/pace/pace/utils/write_photochat.py#L46)
Hello, When i follow the steps of quickstart, i am confused about how to get the final results about multi-modal dialogue retrieval on PhotoChat in PaCE. I think that the evaluation script compute scores of validation dataset, am i right? And when i change the split from 'val' to 'test', it only gets lower scores than ones in the paper. Hope for your help! Best regards.