Training model for 3D VG

ZCMax / LLaVA-3D

A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World

176 stars 5 forks source link

Training model for 3D VG #7

Open col14m opened 1 month ago

col14m commented 1 month ago

Hello. Could you please advise me on how to properly train a model for 3D VG on ScanRefer: model, losses, dataset, metrics?

Your current model can predict bounding boxes only as text and only with an additional click on the object, if I understood everything correctly.

ZCMax commented 1 month ago

Yes, the current code only supports click-based 3D bounding box outputs, but we will release an update next week that includes support for purely language-guided 3D visual grounding tasks. Currently the code does not officially support 3D Visual Grounding Task, which requires the extra grounding head to achieve the accurate grounding results. We’ve tried simply output the 3D bounding box of object in text or location token format in the 3D VG cases, and found that it does not work well~

xjj1999 commented 1 month ago

Hello, Has the 3D visual grouding module been released yet?

ZCMax commented 3 weeks ago

Sorry for the late reply, we would release the 3D VG related code after the CVPR ddl~, I‘m sorry for that and thanks for your understanding~