OpenGVLab / all-seeing

[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"
https://huggingface.co/spaces/OpenGVLab/all-seeing
425 stars 13 forks source link

About 100k of answers of sam data is `"what is the difference between a man and a woman?"' #18

Closed janghyuncho closed 2 months ago

janghyuncho commented 3 months ago

I guess there was some kind of a bug? See, for example: image

{'id': -1, 'image': 'sam/sa_000044/sa_500317.jpg', 'height': 1500, 'width': 2666, 'conversations': [{'from': 'human', 'value': '\nWhat is the primary function of these missiles? Please answer the question according to this region: [[529, 530, 590, 622]].'}, {'from': 'gpt', 'value': '"What is the difference between a man and a woman?"'}, {'from': 'human', 'value': 'What is the primary function of these missiles? Please answer the question according to this region: [[606, 597, 750, 737]].'}, {'from': 'gpt', 'value': '"What is the difference between a man and a woman?"'}]}

muyuelingxiao commented 3 months ago

Excuse me, could you find the file "asmv2-13b.jsonl"? I cant find the location to download it when evaluating.

Weiyun1025 commented 2 months ago

Sorry for the late response.

You should input <image>\nWhat is the primary function of these missiles? Please answer the question according to this region instead of \nWhat is the primary function of these missiles? Please answer the question according to this region to ensure the image is properly inputted into the model..

janghyuncho commented 2 months ago

This is your annotation.