OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
https://vchat.opengvlab.com/
MIT License
3k stars 247 forks source link

Ego4d dataset split unavailable #86

Closed LiJiaqi96 closed 8 months ago

LiJiaqi96 commented 9 months ago

Hi, thanks for your great work of VideoChat2!
I tried to organize the Ego4d dataset used in the paper. But I found that there are several splits for each video, and the split information is unavailable neither on Ego4d website nor on this repo.
Is there any information about how the splits were performed? Thanks!

An example (the question is about how to obtain the "split_0.mp4":
d250521e-5197-44aa-8baa-2f42b24444d2/split_0.mp4

Andy1621 commented 9 months ago

Please check the original JSON here. You may need to download the video from Ego4D and split the videos by yourself.

LiJiaqi96 commented 9 months ago

Thanks for your quick reply!!
BTW, the same issue occurs in the YouCook2 dataset. I observed that in YouCook2, the split was done by the "segment" in the original json file. Is it the index of frames? Thanks:)

Andy1621 commented 9 months ago

Yes. The segment means the start second and end second.

LiJiaqi96 commented 9 months ago

Thanks again for your helpful information!

cathyxl commented 9 months ago

Hi @Andy1621 , I found many splits for one video_uid in the ego4f_nlp_qa.json. I'm wondering how you index the splits. Do splits with earlier video start sec have smaller index numbers?

Andy1621 commented 9 months ago

@cathyxl I just simply split the video according to the annotations. For the same video_uid, different clip_start_sec and clip_end_sec will lead to different splits, thus generating split0, split1 and so on.

cathyxl commented 9 months ago

@cathyxl I just simply split the video according to the annotations. For the same video_uid, different clip_start_sec and clip_end_sec will lead to different splits, thus generating split0, split1 and so on.

Does that mean you decide the index for the clips depending on their appearance order in the annotation file?

Andy1621 commented 8 months ago

Yes, but actually you can split the clips by yourself and make up the JSON file.

cathyxl commented 8 months ago

Yes, but actually you can split the clips by yourself and make up the JSON file.

Can you kindly provide the script to split the ego4d videos? I found there were some errors when splitting these videos. It will affect the performance a lot if the split videos are not matched with the instruction data samples.

Andy1621 commented 8 months ago

I'm sorry that I can not find the full scripts. However, I find some scripts about ffmpeg as follows:

mkdir -p your_path/EgoQA/split_videos/d250521e-5197-44aa-8baa-2f42b24444d2
ffmpeg -ss 55.8300286 -t 4.4510000000000005 -accurate_seek -i your_path/EgoQA/raw_videos/d250521e-5197-44aa-8baa-2f42b24444d2/0.mp4 -c:v libx264 -c:a aac -strict experimental -an your_path/EgoQA/split_videos/d250521e-5197-44aa-8baa-2f42b24444d2/split_0.mp4
mkdir -p your_path/EgoQA/split_videos/d250521e-5197-44aa-8baa-2f42b24444d2
ffmpeg -ss 62.7295786 -t 9.501449999999984 -accurate_seek -i your_path/EgoQA/raw_videos/d250521e-5197-44aa-8baa-2f42b24444d2/0.mp4 -c:v libx264 -c:a aac -strict experimental -an your_path/EgoQA/split_videos/d250521e-5197-44aa-8baa-2f42b24444d2/split_1.mp4
mkdir -p your_path/EgoQA/split_videos/d250521e-5197-44aa-8baa-2f42b24444d2
ffmpeg -ss 150.5177086 -t 3.9923200000000065 -accurate_seek -i your_path/EgoQA/raw_videos/d250521e-5197-44aa-8baa-2f42b24444d2/0.mp4 -c:v libx264 -c:a aac -strict experimental -an your_path/EgoQA/split_videos/d250521e-5197-44aa-8baa-2f42b24444d2/split_2.mp4
mkdir -p your_path/EgoQA/split_videos/d250521e-5197-44aa-8baa-2f42b24444d2
ffmpeg -ss 7.1810286 -t 1.3579999999999997 -accurate_seek -i your_path/EgoQA/raw_videos/d250521e-5197-44aa-8baa-2f42b24444d2/0.mp4 -c:v libx264 -c:a aac -strict experimental -an your_path/EgoQA/split_videos/d250521e-5197-44aa-8baa-2f42b24444d2/split_3.mp4
mkdir -p your_path/EgoQA/split_videos/d250521e-5197-44aa-8baa-2f42b24444d2
ffmpeg -ss 214.81002859999998 -t 11.640000000000015 -accurate_seek -i your_path/EgoQA/raw_videos/d250521e-5197-44aa-8baa-2f42b24444d2/0.mp4 -c:v libx264 -c:a aac -strict experimental -an your_path/EgoQA/split_videos/d250521e-5197-44aa-8baa-2f42b24444d2/split_4.mp4
mkdir -p your_path/EgoQA/split_videos/d250521e-5197-44aa-8baa-2f42b24444d2
ffmpeg -ss 227.0350286 -t 14.85499999999999 -accurate_seek -i your_path/EgoQA/raw_videos/d250521e-5197-44aa-8baa-2f42b24444d2/0.mp4 -c:v libx264 -c:a aac -strict experimental -an your_path/EgoQA/split_videos/d250521e-5197-44aa-8baa-2f42b24444d2/split_5.mp4
mkdir -p your_path/EgoQA/split_videos/d250521e-5197-44aa-8baa-2f42b24444d2
ffmpeg -ss 254.8062886 -t 8.893740000000008 -accurate_seek -i your_path/EgoQA/raw_videos/d250521e-5197-44aa-8baa-2f42b24444d2/0.mp4 -c:v libx264 -c:a aac -strict experimental -an your_path/EgoQA/split_videos/d250521e-5197-44aa-8baa-2f42b24444d2/split_6.mp4
mkdir -p your_path/EgoQA/split_videos/d250521e-5197-44aa-8baa-2f42b24444d2
ffmpeg -ss 7.5185686 -t 5.502459999999999 -accurate_seek -i your_path/EgoQA/raw_videos/d250521e-5197-44aa-8baa-2f42b24444d2/0.mp4 -c:v libx264 -c:a aac -strict experimental -an your_path/EgoQA/split_videos/d250521e-5197-44aa-8baa-2f42b24444d2/split_7.mp4
mkdir -p your_path/EgoQA/split_videos/d250521e-5197-44aa-8baa-2f42b24444d2
ffmpeg -ss 120.70256859999999 -t 2.3184600000000017 -accurate_seek -i your_path/EgoQA/raw_videos/d250521e-5197-44aa-8baa-2f42b24444d2/0.mp4 -c:v libx264 -c:a aac -strict experimental -an your_path/EgoQA/split_videos/d250521e-5197-44aa-8baa-2f42b24444d2/split_8.mp4
cathyxl commented 8 months ago

In your command, the -ss is the start sec, -t is the duration? There are no durations in the ego4d_nlp_qa.json. Did you use cilp_end_sec-clip_start_sec to get the duration?

My problem is that there are some clips having almost the same clip_start_sec and clip_end_sec in the ego4d_nlp_qa.json, did you include these clips ?

Andy1621 commented 8 months ago

Yes, I just use the diff as duration. For your second problem, I have not checked the overlap between different clips, but I think it's normal for one clip to match multiple QAs.

cathyxl commented 8 months ago

I find my downloaded ego4d videos are d250521e-5197-44aa-8baa-2f42b24444d2.mp4 instead of d250521e-5197-44aa-8baa-2f42b24444d2/0.mp4. Is there any problem? image

Andy1621 commented 8 months ago

@cathyxl I'm compressing the videos and will upload it later~

cathyxl commented 8 months ago

That will be great! Thanks a lot. Btw, will you also upload videos of other datasets ? I found the downloaded video paths of InterVid are not the same as those in the 1.9M instruction data. Can you also show how to process the InternVid files?

Andy1621 commented 8 months ago

Yes, I can also upload the part of VideoChat2 conversation~

Andy1621 commented 8 months ago

@cathyxl For EgoQA videos, download them from this link. For VideoChat2 conversation videos, download them from this link.

Besides, for splitting YouCook data, please follow the code:

import os
import subprocess
import json

def change_time(segment):
    duration = segment[1] - segment[0]
    hour = segment[0] // 3600
    minute = (segment[0] - 3600 * hour) // 60
    second = segment[0] % 60
    start = f"{hour}:{minute}:{second}"
    return start, duration

def process_video(src_path, des_path, start, duration):
    if not os.path.exists(os.path.join(des_path, start + '.mp4')):
        cmd = f"ffmpeg -ss {start} -t {duration} -accurate_seek -i {src_path} -c:v libx264 -c:a aac -strict experimental -b:a 98k {des_path}"
        subprocess.call(cmd, shell=True)

path = "user/youcook2/raw_videos"
split_lst = ['training', 'validation', 'testing']
total_file = {}
for split in split_lst:
    dir_list = os.listdir(os.path.join(path, split))
    for dir in dir_list:
        file_list = os.listdir(os.path.join(path, split, dir))
        for file in file_list:
            name = file.split('.')[0]
            total_file[name] = os.path.join(path, split, dir, file)

json_data = json.load(open("user/youcook2/youcookii_annotations_trainval.json", "r"))

des = "user/youcook2/split_videos"
caption_dict = {
    "training": [],
    "validation": [],
    "testing": []
}
for name, src_path in total_file.items():
    suffix = '/'.join(src_path.split('/')[-3:]).split('.')[0]
    des_dir = os.path.join(des, suffix)
    print(des_dir)
    if not os.path.exists(des_dir):
        os.makedirs(des_dir)
    for anno in json_data['database'][name]['annotations']:
        split = json_data['database'][name]['subset']
        idx = anno['id']
        caption = anno['sentence']
        segment = anno['segment']
        start, duration = change_time([74, 83])
        des_path = os.path.join(des_dir, f"split_{idx}.mp4")
        process_video(src_path, des_path, start, duration)
        caption_dict[split].append({
            "video": suffix + '/' + f"split_{idx}.mp4",
            "caption": caption
        })
cathyxl commented 8 months ago

Thanks a lot! @Andy1621. Btw, I have the same problem with kinetics710. I found my downloaded video paths of kinetics 400, 600 and 700 cannot match these paths in the 1.9M instruction data. Can you also provide the preprocessing scripts?

Andy1621 commented 8 months ago

@cathyxl Hi! Please check our raw Kinetics annotation files here. As for the raw videos, I think you may need to find the related link from the official websites, from cvfoundation, or from Open DataLab. It may be illegal for us to share Kinetics Videos directly.

BTW, it's normal that some videos are missed since the YouTube links are not available.

cathyxl commented 8 months ago

@Andy1621 I see~I find 51 videos missing in my downloaded files. I think it might be ok. Besides, I am also looking into the image paths, I found vqav2, vqav2_chinese, st_vqa, okvqa, okvqa_chinese, aokvqa and imagenet have some or all data paths in the pattern of train/xxxx.jpg(x are numbers), which are not coco image paths nor imagenet paths. Can you share how these image paths are organized?

I noticed that m3it has provided the image base64 strs, are these paths related to those base64 strs?

Andy1621 commented 8 months ago

Yes. Most of the image files are from M3IT. And we transform the base64 (image_str) to an image using img_id.

As for some files that do not have img_id, we use the line_id, which is generated by enumerate(line).

Andy1621 commented 8 months ago

And thanks for your notice, I have uploaded vqav2_chinese and okvqa_chinese which were not used. I will remove it later in HF.

Andy1621 commented 8 months ago

@cathyxl I have found some errors in YouCook2 videos. I have split the videos at the same duration start, duration = change_time([74, 83])... I will split the videos again and update the videos~~

cathyxl commented 8 months ago

hi~@Andy1621 have you uploaded the you cooked videos?

Andy1621 commented 8 months ago

@cathyxl I have updated the youccok2 videos at the same link. Besides, the train.json has been updated since some videos are unable to be read.

Furthermore, I have uploaded the random train_80k.json for webvid_caption and train_100k.json for coco_caption, which are smaller and lead to similar results. Check them in hf.

cathyxl commented 8 months ago

@Andy1621 Can you pin the link to the youcook2 videos here? I cannot find the link.

yinanhe commented 8 months ago

@cathyxl huggingface

cathyxl commented 8 months ago

@yinanhe this seems to be a link to ego4d, how about the youcook2?

yinanhe commented 8 months ago

@cathyxl If you downloaded the zip file named "egoqa_split_videos.zip" between 11:00 AM on January 23, 2024(UTC+8) and 11:00 AM on January 24, 2024 (UTC+8), there's no need to re-download it. The videos inside it are for YouCook. I'm sorry for this typo, youcook_split_videos_parta youcook_split_videos_partb are normal now. From now on, the videos in egoqa_split_videos.zip are the ones for ego4d.

yinanhe commented 8 months ago

It seems that the issue has been fixed. If you still have any problems, please feel free to reopen this issue.

Andy1621 commented 5 months ago

For those who are interested in YouCook2, I have updated the JSON files in HF.

pritamqu commented 1 month ago

@cathyxl If you downloaded the zip file named "egoqa_split_videos.zip" between 11:00 AM on January 23, 2024(UTC+8) and 11:00 AM on January 24, 2024 (UTC+8), there's no need to re-download it. The videos inside it are for YouCook. I'm sorry for this typo, this link is normal now. From now on, the videos in egoqa_split_videos.zip are the ones for ego4d.

hey the youcook link seems to be broken again - although I find an hf/dataset link here: https://huggingface.co/datasets/ynhe/videochat2_data/blob/main/youcook_split_videos.zip.partab; not sure if we can extract from it without other parts, could you please have a look?

yinanhe commented 1 month ago

@cathyxl If you downloaded the zip file named "egoqa_split_videos.zip" between 11:00 AM on January 23, 2024(UTC+8) and 11:00 AM on January 24, 2024 (UTC+8), there's no need to re-download it. The videos inside it are for YouCook. I'm sorry for this typo, this link is normal now. From now on, the videos in egoqa_split_videos.zip are the ones for ego4d.

hey the youcook link seems to be broken again - although I find an hf/dataset link here: https://huggingface.co/datasets/ynhe/videochat2_data/blob/main/youcook_split_videos.zip.partab; not sure if we can extract from it without other parts, could you please have a look?

@pritamqu Sorry, partaa was not uploaded successfully due to network problems, and partaa has been uploaded now. See the link https://huggingface.co/datasets/ynhe/videochat2_data/resolve/main/youcook_split_videos.zip.partaa

You need to execute the following command to unzip the compressed package

cat youcook_split_videos.zip * >> youcook_split_videos.zip
unzip youcook_split_videos.zip