Meituan-AutoML / MobileVLM

Strong and Open Vision Language Assistant for Mobile Devices
Apache License 2.0
969 stars 65 forks source link

Is this able to provide the position? #56

Open husnoo opened 1 month ago

husnoo commented 1 month ago

E.g. "give the position of the red box" -> "[x1,x2,y1,y2]"

thanks!