Open kahnchana opened 1 year ago
Are segmentation outputs (coordinates) directly predicted from network as floating point numbers under next token prediction loss? This part is quite unclear in the paper.
Or are they regressed (using the bin tokens) from anchor points?
Are segmentation outputs (coordinates) directly predicted from network as floating point numbers under next token prediction loss? This part is quite unclear in the paper.
Or are they regressed (using the bin tokens) from anchor points?