facebookresearch / jepa

PyTorch code and models for V-JEPA self-supervised learning from video.
Other
2.68k stars 254 forks source link

video_dataset: Random sample assignment #64

Open thomasf1 opened 6 months ago

thomasf1 commented 6 months ago

In the file video_dataset.py, the getitem not make sense to me:

    def __getitem__(self, index):
        sample = self.samples[index]

        # Keep trying to load videos until you find a valid sample
        loaded_video = False
        while not loaded_video:
            buffer, clip_indices = self.loadvideo_decord(sample)  # [T H W 3]
            loaded_video = len(buffer) > 0
            if not loaded_video:
                index = np.random.randint(self.__len__())
                sample = self.samples[index]

        # Label/annotations for video
        label = self.labels[index]

        def split_into_clips(video):
            """ Split video into a list of clips """
            fpc = self.frames_per_clip
            nc = self.num_clips
            return [video[i*fpc:(i+1)*fpc] for i in range(nc)]

        # Parse video into frames & apply data augmentations
        if self.shared_transform is not None:
            buffer = self.shared_transform(buffer)
        buffer = split_into_clips(buffer)
        if self.transform is not None:
            buffer = [self.transform(clip) for clip in buffer]

        return buffer, label, clip_indices

Particularly, the following:

            if not loaded_video:
                index = np.random.randint(self.__len__())
                sample = self.samples[index]

In the current setup (at least in eval), samples are file paths to videos. So, here we´re replacing the video output with a random other video and returning it as the video at the current index with the label for the current index?

Worst case: This could mess up validation (if the labels are used) Best case: Random double videos

Or maybe I´m missing something?