Closed theAdamColton closed 1 year ago
Hi @theAdamColton, thanks so much for sharing this information!
This may be caused by some errors in sending the requests to GPT-4. By a quick investigation, we filter the responses of detailed descriptions by "message is empty" and find the following invalid instances:
30
4417
1159649
1591845
2316987
2319201
2319806
2327174
2342212
2343061
2349999
2360367
2378561
2382059
2382113
2385150
2394393
2397078
2402123
2403887
2415254
As a workaround, you can exclude these cases for now. We'll record this issue and try to fix it in later releases.
Does the data updated now?
Hi. The next version will be released in a month with more data.
Hi. The next version will be released in a month with more data.
hi, i have a question. dose this dataset contain images? or just caption and gpt4 response? i need a dataset with image , user/question and response/answer for research.
Hi. The next version will be released in a month with more data.
hi, i have a question. dose this dataset contain images? or just caption and gpt4 response? i need a dataset with image , user/question and response/answer for research.
Yes, the images are from Visual Genome. You can download them from Visual Genome or here.
Hi. The next version will be released in a month with more data.
hi, i have a question. dose this dataset contain images? or just caption and gpt4 response? i need a dataset with image , user/question and response/answer for research.
Yes, the images are from Visual Genome. You can download them from Visual Genome or here.
Should I need to load images.zip with svit.zip file ?
Update: The detailed descriptions with "message is empty" are removed from the dataset. Besides, we've added a new subset for referring QAs. You can find the data in this link.
Close this issue as solved in the update. Feel free to reopen it if there's any other questions.
Close this issue as solved in the update. Feel free to reopen it if there's any other questions.
I will suggest, to make the Dataset clear , if someone want to use, like this. Image, question, response.
Because when you download the dataset from hugging face there is no image colom in the data.
Hello, could I ask whether the next version with more data is still base on vg or more images?
The new verson has been released. It is also based on Visual Genome, but with bounding boxes for referring (QA) tasks.
Thank you! Very nice work.
Hi. The next version will be released in a month with more data.
hi, i have a question. dose this dataset contain images? or just caption and gpt4 response? i need a dataset with image , user/question and response/answer for research.
Yes, the images are from Visual Genome. You can download them from Visual Genome or here.
Should I need to load images.zip with svit.zip file ?
Yes, you'll need to load images.zip from Visual Genome, as well as the SVIT.zip for instructions and responses.
Close this issue as solved in the update. Feel free to reopen it if there's any other questions.
I will suggest, to make the Dataset clear , if someone want to use, like this. Image, question, response.
Because when you download the dataset from hugging face there is no image colom in the data.
Thanks for the suggestion! Currently we keep the images and instructions separate, as some users may have already downloaded the Visual Genome images before. We'll add this suggestion to the backlog.
Does the data updated now?
Hi @lucasjinreal, the data is updated now. The empty messages of detailed descriptions are excluded from the dataset. And a subset about referring QAs is also added in this release.
So far I've noticed this with the descriptions for images 30, 4417, 1159649, 2316987, 2319201, 2343061
for these descriptions, the image detailed description is: "It seems like your message is empty. How can I assist you today?", or something along the lines of "It seems like your message is empty. Can you please provide more details or ask a question so that I can assist you better?".
I discovered this by converting the detail descriptions data to json, and grepping for the term 'message is empty'