Open col14m opened 1 month ago
Yes, the current code only supports click-based 3D bounding box outputs, but we will release an update next week that includes support for purely language-guided 3D visual grounding tasks. Currently the code does not officially support 3D Visual Grounding Task, which requires the extra grounding head to achieve the accurate grounding results. We’ve tried simply output the 3D bounding box of object in text or location token format in the 3D VG cases, and found that it does not work well~
Hello, Has the 3D visual grouding module been released yet?
Sorry for the late reply, we would release the 3D VG related code after the CVPR ddl~, I‘m sorry for that and thanks for your understanding~
Hello. Could you please advise me on how to properly train a model for 3D VG on ScanRefer: model, losses, dataset, metrics?
Your current model can predict bounding boxes only as text and only with an additional click on the object, if I understood everything correctly.