OpenRobotLab / PointLLM

[ECCV 2024] PointLLM: Empowering Large Language Models to Understand Point Clouds
https://runsenxu.com/projects/PointLLM
450 stars 22 forks source link

Any benchmark on ScanQA dataset? #17

Closed jkstyle2 closed 6 months ago

jkstyle2 commented 6 months ago

Hello~ thanks for sharing your great work! I'm wondering if there is any benchmark test for scene understanding task on dataset like ScanQA, which was done by 3D-LLM.

RunsenXu commented 6 months ago

Hi,

We don’t have the result at the moment. Collecting really high-quality data for training is much more complicated for 3D scenes, and we are working on this and will include testing on ScanQA in the future.

Nevertheless it’s possible to get a baseline result at the moment as PointLLM can accept ScanNet point clouds as well. But the vanilla PointLLM was not trained on scene-level point clouds with ScanQA QA format, so you may need to use ChatGPT to post-process the output to check if the model’s answer is correct. I expect the accuracy to be low though.

Best, Runsen

jkstyle2 commented 6 months ago

Thanks for the considerate reply! May I ask, Can you explain what 'high-quality' data means? and also How is dataset Objaverse different from ScanQA regarding the model PointLLM view? I don't know why you expect the model would be able to inference well on Objaverse, but not ScanQA.

RunsenXu commented 6 months ago

High-quality means:

  1. Accurate
  2. Cover as rich information of a scene as possible including the scene’s layout, appearance, style, functionality, affordance, and many more. A scene contains numerous elements and is much more complicated than a single object.
  3. Suitable for the model to learn, which is an open problem.

PointLLM didn’t see any ScanNet data and there will be domain gap, especially for that Objaverse contains object point clouds but ScanNet has scene point clouds. They are different.