ch3cook-fdu / Vote2Cap-DETR

[CVPR 2023] Vote2Cap-DETR and [T-PAMI 2024] Vote2Cap-DETR++; A set-to-set perspective towards 3D Dense Captioning; State-of-the-Art 3D Dense Captioning methods
MIT License
76 stars 5 forks source link

Question for ScanRefer benchmark, not Scan2cap #15

Open jkstyle2 opened 3 months ago

jkstyle2 commented 3 months ago

Dear authors, I am wondering why the paper is said that Vote2Cap is tested on ScanRefer, not Scan2cap benchmark. As long as I understand, ScanRefer takes pointclouds with a text query as inputs and finds the referred unique 3D box. On the other hands, Scan2Cap takes only pointclouds input and estimate 3D boxes with descriptions. I think Vote2Cap is conducted for the task like Scan2Cap, but in your paper it is written to be evaluated on ScanRefer.

Did you also conduct your model on ScanRefer benchmark test? If so, can you share how it works as ScanRefer task requires two inputs, the pointcloud scenes and query. If it was actually tested on Scan2Cap, is there any method to test your mode on ScanRefer?

Thanks for your help in advance!

ch3cook-fdu commented 3 months ago

Though the task is called Scan2Cap, we identify different benchmarks with respect to the dataset names (i.e. ScanRefer or Nr3D).

Our method is not designed for 3D visual grounding. However, if you are interested, you can make some modifications for that task.

jkstyle2 commented 3 months ago

Would you mind suggesting how to make the modification to work for 3D visual grounding task on ScanRefer dataset? It seems working very well on the 3D grounding task.

ch3cook-fdu commented 3 months ago

You can choose to 1) match the generated instance caption with the querying sentences, or 2) match the generated bounding box feature with the querying sentences like existing 3DVG models.