Open Rexzhan opened 10 months ago
Thanks so much for your reminder, I just updated the new files at:
https://huggingface.co/xdecoder/SEEM/blob/main/panoptic_train2017_filtrefgumdval.json https://huggingface.co/xdecoder/SEEM/blob/main/grounding_train2017_filtrefgumd.json https://huggingface.co/xdecoder/SEEM/blob/main/coco_train2017_filtrefgumdval_lvis.json https://huggingface.co/xdecoder/SEEM/blob/main/captions_train2017_filtrefgumdval.json
@MaureenZOU Thanks for the update! It seems these new files are not reflected on the README yet. To reproduce SEEM, I assume these new files are correct?
@ziqipang , @MaureenZOU May i ask when the model consumes coco_caption_karpathy_test.arrow,filtcoco2017val_caption_karpathy_train.arrow... in pretrain_arrows_code224 during training stage? I followed the script TRAIN.md, but the provided config files seem to not be using vlp_dataset/coco_caption_karpathy? Please correct me if I am wrong,,
@Rexzhan I think XDecoder uses those vision-language data, but SEEM only uses the segmentation data. I am just starting to work in this field. Please double check if my understanding is correct.
Thanks so much for your reminder, I just updated the new files at:
https://huggingface.co/xdecoder/SEEM/blob/main/panoptic_train2017_filtrefgumdval.json https://huggingface.co/xdecoder/SEEM/blob/main/grounding_train2017_filtrefgumd.json https://huggingface.co/xdecoder/SEEM/blob/main/coco_train2017_filtrefgumdval_lvis.json https://huggingface.co/xdecoder/SEEM/blob/main/captions_train2017_filtrefgumdval.json
sorry,but i don't see the file "panoptic_val2017.json " you mentioned in "DATASET.md",could you upload it?thx.
@MaureenZOU sorry,but i don't see the file "panoptic_val2017.json " you mentioned in "DATASET.md",could you upload it?thx.
This could be download from the official website
According to the DATESET.md, I have downloaded related to coco2017 dataset from official website(https://cocodataset.org/#download) I also create some *.arrow files but I can't create all of files for training.
How to create "4M Image Text Pairs" with ViLT?
4M Image Text Pairs (X-Decoder)
We follow the exact data preparation for the image text pairs data with [ViLT](https://github.com/dandelin/ViLT/blob/master/DATA.md).
# The pretrained arrow file are put under .xdecoder_data/pretrain_arrows_code224 with the following list of files.
["filtcoco2017val_caption_karpathy_train.arrow", "filtcoco2017val_caption_karpathy_val.arrow", "filtcoco2017val_caption_karpathy_restval.arrow"] + ["code224_vg.arrow"] + [f"code224_sbu_{i}.arrow" for i in range(9)] + [f"code224_conceptual_caption_train_{i}.arrow" for i in range(31)]
# ["filtcoco2017val_caption_karpathy_train.arrow", "filtcoco2017val_caption_karpathy_val.arrow", "filtcoco2017val_caption_karpathy_restval.arrow"] are originated from ["filtcoco2017val_caption_karpathy_train.arrow", "filtcoco2017val_caption_karpathy_val.arrow", "filtcoco2017val_caption_karpathy_restval.arrow"] with deletion of coco val2017 overlapped images to avoid information leakage.
To get quick started:
# Download coco karparthy test set (we hack the training data to be coco_caption_karpathy_test.arrow only for quick start in the codebase)
wget https://huggingface.co/xdecoder/X-Decoder/resolve/main/coco_caption_karpathy_test.arrow
After dataset preparation, the dataset structure would be:
.xdecoder_data
└── pretrain_arrows_code224/
├── coco_caption_karpathy_test.arrow
├── *filtcoco2017val_caption_karpathy_train.arrow
├── ...
├── *code224_vg.arrow
├── *code224_sbu_0.arrow
├── ...
├── *code224_conceptual_caption_train_0.arrow
└── ...
* Those datasets are optional for debugging the pipeline. ! NEED to add back when you are training the model.
Nice work! But I met an error when trying to reproduce your training progress. Could you tell me how to prepare these three? "coco/annotations/panoptic_train2017_filtrefgumdval.json", "coco/annotations/captions_train2017_filtrefgumdval.json", "coco/annotations/grounding_train2017_filtrefgumd.json". I think I only got panoptic_train2017_filtrefgumdval_filtvlp/captions_train2017_filtrefgumdval_filtvlp/grounding_train2017_filtrefgumdval_filtvlp following your dataset prepare guide.