Correspondence between annotation file names and processed video names?

celebv-text / CelebV-Text

(CVPR 2023) CelebV-Text: A Large-Scale Facial Text-Video Dataset

391 stars 33 forks source link

Thanks a lot for providing the excellent dataset!

I am trying to download and process the face video dataset, and I find that it is hard to figure out the correspondence between annotation file names and processed video names.

For instance, after processing videos by 'download_and_process.py', I get 2 youtube videos with the same name 'hDf4PjXl64Q_35' from 'clips_set1' and 'clips_set2' in celebvtext_info.json file. Meanwhile, we can find the annotation files (action, emotion) contains 2 similar annotation names ('hDf4PjXl64Q_35_2' and 'hDf4PjXl64Q_35_4'). How to get the correspondence between these 2 videos and 2 annotation files?

I find that The 'clip_set1' contains 67025 videos with different names and the 'clip_set2' contains 2975 (70000-67025) videos whose names all appear in 'clip_set1'. The current 'download_and_process.py' file only contains codes for downloading 'clip_set1'.

Thanks a lot!

celebv-text / CelebV-Text

Correspondence between annotation file names and processed video names? #5