Open jkstyle2 opened 3 months ago
Though the task is called Scan2Cap, we identify different benchmarks with respect to the dataset names (i.e. ScanRefer or Nr3D).
Our method is not designed for 3D visual grounding. However, if you are interested, you can make some modifications for that task.
Would you mind suggesting how to make the modification to work for 3D visual grounding task on ScanRefer dataset? It seems working very well on the 3D grounding task.
You can choose to 1) match the generated instance caption with the querying sentences, or 2) match the generated bounding box feature with the querying sentences like existing 3DVG models.
Dear authors, I am wondering why the paper is said that Vote2Cap is tested on ScanRefer, not Scan2cap benchmark. As long as I understand, ScanRefer takes pointclouds with a text query as inputs and finds the referred unique 3D box. On the other hands, Scan2Cap takes only pointclouds input and estimate 3D boxes with descriptions. I think Vote2Cap is conducted for the task like Scan2Cap, but in your paper it is written to be evaluated on ScanRefer.
Did you also conduct your model on ScanRefer benchmark test? If so, can you share how it works as ScanRefer task requires two inputs, the pointcloud scenes and query. If it was actually tested on Scan2Cap, is there any method to test your mode on ScanRefer?
Thanks for your help in advance!