Closed fujianhai closed 8 months ago
@PhyscalX , thank you very much, are you train the caption info on like coco caption data?
No. VisualGenome is currently the largest public RegionCaption dataset. COCO caption and other Image-Text datasets are typical ImageCaption datasets.
I have a question, The gt label for captions how to gernerated ?