OpenGVLab / VisionLLM

VisionLLM Series
https://arxiv.org/abs/2305.11175
Apache License 2.0
865 stars 22 forks source link

About segmentation outputs #6

Open kahnchana opened 1 year ago

kahnchana commented 1 year ago

Are segmentation outputs (coordinates) directly predicted from network as floating point numbers under next token prediction loss? This part is quite unclear in the paper.

Or are they regressed (using the bin tokens) from anchor points?