Closed lucasjinreal closed 3 months ago
How to mapping this image file really?
I've also noticed this issue. It seems that after the existing allava was updated, some pictures were deleted. Can the repository owner take a look at the latest allava dataset and provide relevant information? @yanwei-li
It's not deleted, the index number doesn't really match the ALLava's images, besides, I think the minigemini data provided image name are trucked, where is jpeg
suffix? This really shouldn't happen for them, make users very confused and missleading..
I try to use the url of minigemini pretrain dataset to match the url of allava dataset, minigemini pretrain dataset is missing about 6939 images.
Missing is normal, the question is, we have to using url to match the correct image id (filename) ?
this is rediculous.
@daicver @lucasjinreal Hi, thanks for using the ALLaVA data. I am from the ALLaVA group. We did have a silent update soon after we release the data. And it seems that the Mini-Gemini project was using the data before the update.
In the original version, the images
entry looked like allava_laion/allava_laion_512763
and all image filenames were without a suffix either, which means they are mapped correctly.
In the current version, we made simultaneous adjustments to annotations and image filenames with suffix added.
We will fix this issue soon with the Mini-Gemini team. Stay tuned!
@g-h-chen Oh, didnt notice that am download actually an updated version.
So, looks like minigemini were using older data, Just wonder, does the gpt repsonse also changed? Is the newer data is a super collection to older one?
If so, we can just mapping the name with url propabaly?
@lucasjinreal
So, looks like minigemini were using older data, Just wonder, does the gpt repsonse also changed?
NO change in GPT-4V response.
Is the newer data is a super collection to older one?
No. In short, we only add postfixes to image filenames and annotation files so that one can preview it easily.
we can just mapping the name with url propabaly?
Sure you can do so, but we have uploaded the images in our repo as well which saves some effort for you.
Yes, I downloaded the images,
Oh, I found using url to mapping ,still get some file unable to map correclty. Any solution?
For example:
/ALLaVA-4V/allava_laion/images/281029 not found
/allava_laion/images/387904
Hi, we @g-h-chen are working together to align the data and will update the data file soon, please stay tuned.
Looks like the truth is not exaclty said as @g-h-chen , the new Allava actually delete some images which minigemini used. Which I don't know why.
Hi, @lucasjinreal @daicver we have updated the ALLaVA data in our files, please download them in the original link Mini-Gemini-Pretrain and Mini-Gemini-Instruction. We will also re-train our model to find the effect of data change.
ok, thanks
Dudes, (hopefully) a final comment here:
The id
entry of each item for allava_laion
(caption and instruction) is not unique. The number of samples is 505588, and the number of unique ids is 484532. However, the contents are not the same for the samples sharing the same id. The reason is that when ALLaVA project started, we tried out some prompting strategies which led to our final version, but we accidentally re-generated those samples when performing large-scale distillation. However, we kept those samples anyway considering the cost.
Mini-Gemini team and us have updated the aligned data. Sorry for the inconvenience caused!
thank u for your guys immediate response. Closing as it was solved.
Hi, the pretrained data used allava images both from laion nad vfan.
But the laion part image names are totally different from ALLava's images format.
I tried to found:
they all used in minigemini_pretrain.json but can not be found in ALLava images folder.
why is that?