baaivision / tokenize-anything

[ECCV 2024] Tokenize Anything via Prompting
Apache License 2.0
503 stars 19 forks source link

The gt label for captions how to gernerated ? #8

Closed fujianhai closed 8 months ago

fujianhai commented 8 months ago

I have a question, The gt label for captions how to gernerated ?

PhyscalX commented 8 months ago

Hi, @fujianhai

We freeze the image encoder-decoder during caption training. Classification labels are unnecessary.

For ground-truth caption, we use GRIT preprocessed train JSON, and test JSON. These JSON files contain only boxes and captions.

fujianhai commented 8 months ago

@PhyscalX , thank you very much, are you train the caption info on like coco caption data?

PhyscalX commented 8 months ago

No. VisualGenome is currently the largest public RegionCaption dataset. COCO caption and other Image-Text datasets are typical ImageCaption datasets.