OpenGVLab / VisionLLM

VisionLLM Series
https://arxiv.org/abs/2305.11175
Apache License 2.0
762 stars 15 forks source link

About object detection #7

Open chagmgang opened 5 months ago

chagmgang commented 5 months ago

I think that you push below token in llm

['<cls>', '<x1>', '<y1>', '<x2>', '<y2>', '<cls>', '<x1>', '<y1>', '<x2>', '<y2>', '<cls>', '<x1>', '<y1>', '<x2>', '<y2>', ...]

about object detection loss, did you use hungarian matching like detr?

Or if you use just next token prediction by cross entropy loss, how to sort the ground-truth box?