Ucas-HaoranWei / Vary-toy

Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)
565 stars 41 forks source link

How to handle detected coordinates, how is it normalized? #28

Closed LinJM closed 3 months ago

LinJM commented 3 months ago

As the title asks, how does this model process the coordinate information in object detection? Is it normalized to 0 to 1 ? Is it necessary to refer to the input resolution of the Tiny module?

Ucas-HaoranWei commented 3 months ago

After normalizing to 0 to 1, the values expanded by 1000 times, as stated in the report.

LinJM commented 3 months ago

thanks