Some questions about the CrossTask evaluation codes.

xiangyh9988 commented 2 years ago

Hi, I met some questions about the CrossTask evaluation and beg your help.

Is annot generated by the function read_assignment? I'm not sure because I see line 242 is commented. I guess you might generate annot by this function then comment it when evaluating CrossTask, right?
Is the length of annot equal to the length of 2D feature? And is the video_3d downsampled via adaptive_max_pool1d to have the same length of video_2d for concatenation?
When evaluation, in the function get_recall in eval_cross.py, does setting args.recall_frame to 0 matter? It seems line 127 and line 129 do the same computation.
In this following code block, is args.mining about the evaluation on Mining YouTube dataset? When evaluation, is args.recall_frame set to 0 or 1 to select the predict function between cvpr19_predict and arg_max_predict? https://github.com/brian7685/Multimodal-Clustering-Network/blob/808948b4007c47de82bb8e371277130e5b901cad/eval_cross.py#L232-L243

Sorry to bother you. Thanks for your patience and help in advance!

brian7685 commented 2 years ago

The annotation was shared by the author from CrossTask [https://www.rocq.inria.fr/cluster-willow/dzhukov/crosstask_annotations.tar.gz]()
Yes, that is correct. We followed that from the author of CrossTask (_adaptive_maxpool1d is thus just an ugly hack to ensure that both arrays are of the same length.).
The default will be recall_frame==0, the arg_max_predict is used for another evaluation testing.
Yes, the args.mining about the evaluation on Mining YouTube dataset. Yes, we use cvpr19_predict in the end.

xiangyh9988 commented 2 years ago

Thanks for you sharing and guidance. I have no questions now.

brian7685 / Multimodal-Clustering-Network