andrefaraujo / videosearch

Large-scale video retrieval using image queries.
290 stars 104 forks source link

Scoring #7

Closed dimsamaras closed 8 years ago

dimsamaras commented 8 years ago

Good morning,

I have a question concerning the map results presented on "Large-Scale Query-by-Image Video Retrieval Using Bloom Filters" paper that uses the same code. MAP is @ 100 or @ 1? since in the "light_dataset_public.txt" ground truth results per query do not sum to 100. Same goes for "vb_light_dataset_public.txt". On the sample code you use scoring @1 from using the text files, that is clear.

On the "TEMPORAL AGGREGATION FOR LARGE-SCALE QUERY-BY-IMAGE VIDEO RETRIEVAL" paper in order to get the MAP@100, you mention "...compared to a baseline frame-based scheme that does not use asymmetric comparisons." in the end. Do you compare the raw frame descriptors (32d, 128d)? Binarized features comparison?

Thaks a lot, Dimitris

andrefaraujo commented 8 years ago

1) MAP is @ 100. I think there is a misunderstanding on what that means.

since in the "light_dataset_public.txt" ground truth results per query do not sum to 100. Same goes for "vb_light_dataset_public.txt"

Computing MAP @ 100 does not mean that there will be at least 100 ground truth results per query, it only means that we will use the top 100 results for each query to compute MAP. Let me illustrate with an example: say query 0 has 2 ground-truth videos, and say that after the retrieval process the ground-truth videos are ranked in positions 51 and 101. This means that the MAP for this query will be 1/51. (the second ground-truth video is not ranked within the top 100, so it is not taken into account)

On the sample code you use scoring @1 from using the text files, that is clear.

I am not sure what you are referring to, but my guess is that you are referring to the python script evaluate_scene_retrieval. In this case, we perform computation of both MAP @ 100 and Precision @ 1. In all papers, we report results using the MAP @ 100 result given by this script (the 100 comes from setting short_list_size to this value).

2) On your question:

On the "TEMPORAL AGGREGATION FOR LARGE-SCALE QUERY-BY-IMAGE VIDEO RETRIEVAL" paper in order to get the MAP@100, you mention "...compared to a baseline frame-based scheme that does not use asymmetric comparisons." in the end. Do you compare the raw frame descriptors (32d, 128d)? Binarized features comparison?

The sentence you refer to is in the conclusion part of the paper, and basically refers to comparing the results obtained by the SCFV_scene_LOC_2048 method agains the results obtained by the SCFV_frames_no_asym method, as shown in Fig. 6 of the paper. In this paper, all global descriptors (binarized Fisher vectors) are compared in binarized format. As reported in Section 2 of the paper, SIFT descriptors are first PCA-ed from 128D to 32D.