OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
https://vchat.opengvlab.com/
MIT License
2.85k stars 230 forks source link

clevr数据集的使用 #169

Open LiJiaqi96 opened 2 months ago

LiJiaqi96 commented 2 months ago

您好,请问image_reasoning - clevr数据集具体是哪个?我按文章中的引用找到了https://cs.stanford.edu/people/jcjohns/clevr/,下载了[CLEVR v1.0 (18 GB)],解压后发现图片内容和json中的格式不对应。

Andy1621 commented 2 months ago

您好,图像数据都是用的M3IT中提供的。

LiJiaqi96 commented 2 months ago

谢谢,看了下M3IT,里面json中image是一长串字符,如何将它们对应到VideoChat2给出的“train/39065.jpg”这样的形式?

Andy1621 commented 2 months ago

我们是根据M3IT给的标注,根据序列idx生成的idx.jpg

LiJiaqi96 commented 2 months ago

没太明白...想请教下如何将M3IT中的"image_str"和CLEVR数据集中具体的image名称对应起来呢?

Andy1621 commented 2 months ago

image_str是base64字符串,可以直接读取。我们是转成了RGB图像,image名称是根据for循环遍历M3IT中的数据,对应的idx生成的,不是根据原始CLEVR数据得到的。

LiJiaqi96 commented 2 months ago

明白了!您的idx对应的是使用datasets加载数据后遍历的idx对吧?

Andy1621 commented 2 months ago

对滴

LiJiaqi96 commented 2 months ago

好的,感谢您的解答

LiJiaqi96 commented 1 month ago

在输出的时候还是遇到了一些问题,还得请教下您。下面是我的code:

import os
import base64
import datasets

save_dir = "clevr_M3IT"
ds = datasets.load_dataset("./datasets/M3IT/", "clevr", split="train", streaming=True)
cur_dir = os.path.join(save_dir, "train")
i = 0
for d in ds:
    image = base64.decodebytes(d["image_base64_str"][0].encode())
    with open(cur_dir+f"/{i}.jpg", "wb") as fh:
        fh.write(image)
    i += 1

在输出了一些图片后,我手动看了下部分图片的内容,发现它们并不能和您在HF发布的OpenGVLab/VideoChat2-IT中的QA匹配,比如train/90.jpg, 90
[ { "a": "The answer is cylinder.", "i": "Analyze the given image and respond to the associated question with a correct answer.", "q": "There is a green object that is behind the small rubber cylinder that is to the left of the matte cylinder to the right of the gray thing; what is its shape?" } ]

Andy1621 commented 1 month ago

奇怪,我们这边不是这个图嘞,我让当时处理的小伙伴康康

LiJiaqi96 commented 1 month ago

好的,感谢~

Andy1621 commented 1 month ago

你好,找小伙伴check了一下,对于某些数据集(如CLEVR),M3IT里给的meta信息里有image_index,对于其他数据集,通过for循环的index得到

LiJiaqi96 commented 1 month ago

原来如此,不过好像在CLEVR的metadata里没有看到image_index,代码是:

ds = datasets.load_dataset("./datasets/M3IT/", "clevr", split="train", streaming=True)
ds.info
yinanhe commented 1 month ago

原来如此,不过好像在CLEVR的metadata里没有看到image_index,代码是:

ds = datasets.load_dataset("./datasets/M3IT/", "clevr", split="train", streaming=True)
ds.info

抱歉,看到这个问题,我们是通过直接下载huggingface dataset repo里的jsonl文件读取的 image

LiJiaqi96 commented 1 month ago

可以了!请问是使用huggingface dataset repo里的train.jsonl对吧(而不是train_2023-10-07.jsonl)
https://huggingface.co/datasets/MMInstruction/M3IT/tree/main/data/reasoning/clevr