clevr数据集的使用 - Githubissues

OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

https://vchat.opengvlab.com/

MIT License

2.85k stars 230 forks source link

clevr数据集的使用 #169

Open LiJiaqi96 opened 2 months ago

LiJiaqi96 commented 2 months ago

您好，请问image_reasoning - clevr数据集具体是哪个？我按文章中的引用找到了https://cs.stanford.edu/people/jcjohns/clevr/，下载了[CLEVR v1.0 (18 GB)]，解压后发现图片内容和json中的格式不对应。

Andy1621 commented 2 months ago

您好，图像数据都是用的M3IT中提供的。

LiJiaqi96 commented 2 months ago

谢谢，看了下M3IT，里面json中image是一长串字符，如何将它们对应到VideoChat2给出的“train/39065.jpg”这样的形式？

Andy1621 commented 2 months ago

我们是根据M3IT给的标注，根据序列idx生成的idx.jpg

LiJiaqi96 commented 2 months ago

没太明白...想请教下如何将M3IT中的"image_str"和CLEVR数据集中具体的image名称对应起来呢？

Andy1621 commented 2 months ago

image_str是base64字符串，可以直接读取。我们是转成了RGB图像，image名称是根据for循环遍历M3IT中的数据，对应的idx生成的，不是根据原始CLEVR数据得到的。

LiJiaqi96 commented 2 months ago

明白了！您的idx对应的是使用datasets加载数据后遍历的idx对吧？

Andy1621 commented 2 months ago

对滴

LiJiaqi96 commented 2 months ago

好的，感谢您的解答

LiJiaqi96 commented 1 month ago

在输出的时候还是遇到了一些问题，还得请教下您。下面是我的code：

import os
import base64
import datasets

save_dir = "clevr_M3IT"
ds = datasets.load_dataset("./datasets/M3IT/", "clevr", split="train", streaming=True)
cur_dir = os.path.join(save_dir, "train")
i = 0
for d in ds:
    image = base64.decodebytes(d["image_base64_str"][0].encode())
    with open(cur_dir+f"/{i}.jpg", "wb") as fh:
        fh.write(image)
    i += 1

在输出了一些图片后，我手动看了下部分图片的内容，发现它们并不能和您在HF发布的OpenGVLab/VideoChat2-IT中的QA匹配，比如train/90.jpg，
[ { "a": "The answer is cylinder.", "i": "Analyze the given image and respond to the associated question with a correct answer.", "q": "There is a green object that is behind the small rubber cylinder that is to the left of the matte cylinder to the right of the gray thing; what is its shape?" } ]

Andy1621 commented 1 month ago

奇怪，我们这边不是这个图嘞，我让当时处理的小伙伴康康

LiJiaqi96 commented 1 month ago

好的，感谢~

Andy1621 commented 1 month ago

你好，找小伙伴check了一下，对于某些数据集（如CLEVR），M3IT里给的meta信息里有image_index，对于其他数据集，通过for循环的index得到

LiJiaqi96 commented 1 month ago

原来如此，不过好像在CLEVR的metadata里没有看到image_index，代码是：

ds = datasets.load_dataset("./datasets/M3IT/", "clevr", split="train", streaming=True)
ds.info

yinanhe commented 1 month ago

原来如此，不过好像在CLEVR的metadata里没有看到image_index，代码是：
ds = datasets.load_dataset("./datasets/M3IT/", "clevr", split="train", streaming=True)
ds.info

抱歉，看到这个问题，我们是通过直接下载huggingface dataset repo里的jsonl文件读取的

LiJiaqi96 commented 1 month ago

可以了！请问是使用huggingface dataset repo里的train.jsonl对吧（而不是train_2023-10-07.jsonl）
https://huggingface.co/datasets/MMInstruction/M3IT/tree/main/data/reasoning/clevr