First, thanks for sharing the codes and models. I just wonder in this paper, how the COCO mAP metric was calculated under the next token prediction decoder output. Since traditionally, for computing mAP, we need a confidence score prediction for each box, but for next-token prediction output, how this score can be obtained? or am I missing something?
First, thanks for sharing the codes and models. I just wonder in this paper, how the COCO mAP metric was calculated under the next token prediction decoder output. Since traditionally, for computing mAP, we need a confidence score prediction for each box, but for next-token prediction output, how this score can be obtained? or am I missing something?