ZiyuGuo99 / Point-Bind_Point-LLM

Align 3D Point Cloud with Multi-modalities for Large Language Models
MIT License
386 stars 31 forks source link

Some questions about the downstream tasks of Point-Bind #9

Open SARIHUST opened 9 months ago

SARIHUST commented 9 months ago

Hi, thanks for sharing your amazing work. After going through your paper and some related work, I have some questions that I hope you could shed some light on. They are mainly about the downstream utilization of Point-Bind.

Thanks in advance and again for sharing your work.

ZiyuGuo99 commented 9 months ago

@SARIHUST Thanks for your interest and in-depth comments! Hope our response can help.

  1. Our final goal is to construct a general joint embedding space (ImageBind & Point-Bind) that incorporates 3D modality to existing any-to-any framework. The any-to-3D generation is just an initial attempt, and we also utilize Point-Bind's features for point-to-mesh generation. For example, we have achieved 3D-to-2D generation using Point-Bind's features with a 2D diffusion decoder. We will add further experiments of 'any-to-any with 3D' using Point-Bind's features in the following version.
  2. We are from the same research group as ImageBind-LLM's authors. ImageBind-LLM can be viewed as a summary paper for many our multi-modality instruction tuning researches including Point-LLM.
  3. Indeed, our advantage is free from any 3D instruction data, saving much data collection and tuning resources. The visual cache model can effectively reduce the 2D-3D gap of encoded features for the subsequent LLM, and the color information from 2D images is a trade-off for our data/tuning-free efficiency. We have also experimented with some 3D instruction tuning as follow-up works, which can effectively alleviate such situations. Thanks!