Open chagmgang opened 5 months ago
I think that you push below token in llm
['<cls>', '<x1>', '<y1>', '<x2>', '<y2>', '<cls>', '<x1>', '<y1>', '<x2>', '<y2>', '<cls>', '<x1>', '<y1>', '<x2>', '<y2>', ...]
about object detection loss, did you use hungarian matching like detr?
Or if you use just next token prediction by cross entropy loss, how to sort the ground-truth box?
I think that you push below token in llm
about object detection loss, did you use hungarian matching like detr?
Or if you use just next token prediction by cross entropy loss, how to sort the ground-truth box?