Can u provide the subset only for ViP except llava mix original?

WisconsinAIVision / ViP-LLaVA

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

https://vip-llava.github.io/

Apache License 2.0

214 stars 15 forks source link

Can u provide the subset only for ViP except llava mix original? #7

Open lucasjinreal opened 4 months ago

lucasjinreal commented 4 months ago

So that it can not be easily missleading model using llava original dataset.

Meanwhile, it looks like the images are missing..

    "id": "vcr-52941",
    "image": "vcr1images/lsmdc_3034_IDES_OF_MARCH/3034_IDES_OF_MARCH_01.27.04.788-01.27.10.308@2.jpg",
    "meta_dir": "./dataset/vcr1images/lsmdc_3034_IDES_OF_MARCH/3034_IDES_OF_MARCH_01.27.04.788-01.27.10.308@2.json",
    "class_names": [
        "person",

mu-cai commented 4 months ago

Hello,

Thanks for your interest in our work!

As you can see here https://github.com/mu-cai/ViP-LLaVA/blob/main/llava/train/train.py#L672, you can easily detect the newly added data by the "-" sign.
The images are there! Please follow the instructions in README to download the VCR images.

lucasjinreal commented 4 months ago

Hi The meta is needed to prepare training images?

mu-cai commented 4 months ago

The meta data is included in the vcr_images directory. Therefore, do not worry. The metadata is there.

lucasjinreal commented 4 months ago

I wanna using official llava base. I just need to add a vipProceessor to process image right?

mu-cai commented 4 months ago

Correct. Visual prompt blending is all you need.