gkordo / s2vs

Authors official PyTorch implementation of the "Self-Supervised Video Similarity Learning" [CVPRW 2023]
MIT License
37 stars 2 forks source link

Test on a custom dataset #2

Closed benemana closed 8 months ago

benemana commented 11 months ago

Hi, and thank you for this amazing work.

I was trying to test your code on a custom toy dataset with a couple of "original" videos and a couple of "altered" videos derived from the previous.

Any suggestion on how to structure the dataset? I downloaded the VCDB dataset and, looking at your vcdb.py, I suppose that you serialized the dataset into a pickle file. Can you provide some additional info about this process?

I thought to organize my custom dataset similarly to VCDB, and from your code I see you enforced the following structure:

self.queries = dataset['queries']
self.positives = dataset['positives']
self.dataset = dataset['dataset']

I suppose that 'queries' are the altered videos and 'dataset' the original videos, but: 1) what 'positives' stands for? If it is the ground truth, maybe I can skip it since in this real world scenario I don't have GT. 2) in which format should I encode videos into these keys? Are they simple lists containing the paths pointing of the videos? For instance: dataset['queries'] = ['.../mydataset/video1.mp4', '.../mydataset/video2.mp4',...] ?

Thank you.

gkordo commented 11 months ago

Hi @benemana. Thank you for your kind words. Could we please describe a bit more about what you want to do? Since you do not have labels for your dataset, I don't think that the vcdb.py is the script you need. This is solely for the calculation of mAP and μAP on an evaluation dataset (i.e. VCDB in this case) based on ground truth labels. But there are no labels in your case, so it can not be used.

In case you want to load the video tensors, please have a look at the VideoDatasetGenerator. This load a list of videos according to the provided arguments. You could instantiate it as follows:

dataset = VideoDatasetGenerator(dataset_path, video_list, pattern="{id}", loader="video", fps=1, crop=224, resize=256)

dataset_path - the parent path to dataset videos video_list - list of the IDs of your videos pattern - structure pattern that your videos are stored according to their video ID. You can leave it as {id} in your case. loader - format in which the videos are stored. video if they are videos, frame if they are frames. fps - fps to load the video crop - the dimension for center cropping resize - the dimension for resizing before center cropping

Hope this helps you figure out what you need; otherwise, let me know.