Question about the dataset.

Rexzhan commented 10 months ago

Nice work! But I met an error when trying to reproduce your training progress. Could you tell me how to prepare these three? "coco/annotations/panoptic_train2017_filtrefgumdval.json", "coco/annotations/captions_train2017_filtrefgumdval.json", "coco/annotations/grounding_train2017_filtrefgumd.json". I think I only got panoptic_train2017_filtrefgumdval_filtvlp/captions_train2017_filtrefgumdval_filtvlp/grounding_train2017_filtrefgumdval_filtvlp following your dataset prepare guide.

MaureenZOU commented 10 months ago

Thanks so much for your reminder, I just updated the new files at:

https://huggingface.co/xdecoder/SEEM/blob/main/panoptic_train2017_filtrefgumdval.json https://huggingface.co/xdecoder/SEEM/blob/main/grounding_train2017_filtrefgumd.json https://huggingface.co/xdecoder/SEEM/blob/main/coco_train2017_filtrefgumdval_lvis.json https://huggingface.co/xdecoder/SEEM/blob/main/captions_train2017_filtrefgumdval.json

ziqipang commented 10 months ago

@MaureenZOU Thanks for the update! It seems these new files are not reflected on the README yet. To reproduce SEEM, I assume these new files are correct?

Rexzhan commented 10 months ago

@ziqipang , @MaureenZOU May i ask when the model consumes coco_caption_karpathy_test.arrow,filtcoco2017val_caption_karpathy_train.arrow... in pretrain_arrows_code224 during training stage? I followed the script TRAIN.md, but the provided config files seem to not be using vlp_dataset/coco_caption_karpathy? Please correct me if I am wrong，，

ziqipang commented 10 months ago

@Rexzhan I think XDecoder uses those vision-language data, but SEEM only uses the segmentation data. I am just starting to work in this field. Please double check if my understanding is correct.

xpzwzwz commented 10 months ago

Thanks so much for your reminder, I just updated the new files at:

https://huggingface.co/xdecoder/SEEM/blob/main/panoptic_train2017_filtrefgumdval.json https://huggingface.co/xdecoder/SEEM/blob/main/grounding_train2017_filtrefgumd.json https://huggingface.co/xdecoder/SEEM/blob/main/coco_train2017_filtrefgumdval_lvis.json https://huggingface.co/xdecoder/SEEM/blob/main/captions_train2017_filtrefgumdval.json

sorry,but i don't see the file "panoptic_val2017.json " you mentioned in "DATASET.md",could you upload it?thx.

xpzwzwz commented 10 months ago

@MaureenZOU sorry,but i don't see the file "panoptic_val2017.json " you mentioned in "DATASET.md",could you upload it?thx.

MaureenZOU commented 10 months ago

This could be download from the official website

seungyoungshin commented 10 months ago

According to the DATESET.md, I have downloaded related to coco2017 dataset from official website(https://cocodataset.org/#download) I also create some *.arrow files but I can't create all of files for training.

How to create "4M Image Text Pairs" with ViLT?

4M Image Text Pairs (X-Decoder)
We follow the exact data preparation for the image text pairs data with [ViLT](https://github.com/dandelin/ViLT/blob/master/DATA.md).

# The pretrained arrow file are put under .xdecoder_data/pretrain_arrows_code224 with the following list of files.
["filtcoco2017val_caption_karpathy_train.arrow", "filtcoco2017val_caption_karpathy_val.arrow", "filtcoco2017val_caption_karpathy_restval.arrow"] + ["code224_vg.arrow"] + [f"code224_sbu_{i}.arrow" for i in range(9)] + [f"code224_conceptual_caption_train_{i}.arrow" for i in range(31)]
# ["filtcoco2017val_caption_karpathy_train.arrow", "filtcoco2017val_caption_karpathy_val.arrow", "filtcoco2017val_caption_karpathy_restval.arrow"] are originated from ["filtcoco2017val_caption_karpathy_train.arrow", "filtcoco2017val_caption_karpathy_val.arrow", "filtcoco2017val_caption_karpathy_restval.arrow"] with deletion of coco val2017 overlapped images to avoid information leakage.
To get quick started:

# Download coco karparthy test set (we hack the training data to be coco_caption_karpathy_test.arrow only for quick start in the codebase)
wget https://huggingface.co/xdecoder/X-Decoder/resolve/main/coco_caption_karpathy_test.arrow
After dataset preparation, the dataset structure would be:

.xdecoder_data
└── pretrain_arrows_code224/
    ├── coco_caption_karpathy_test.arrow
    ├── *filtcoco2017val_caption_karpathy_train.arrow
    ├── ...
    ├── *code224_vg.arrow
    ├── *code224_sbu_0.arrow
    ├── ...
    ├── *code224_conceptual_caption_train_0.arrow
    └── ...
* Those datasets are optional for debugging the pipeline. ! NEED to add back when you are training the model.

UX-Decoder / Segment-Everything-Everywhere-All-At-Once

Question about the dataset. #111