OpenGVLab / LAMM

[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
https://openlamm.github.io/
296 stars 16 forks source link

EPCL pertained model details #16

Closed ZCMax closed 1 year ago

ZCMax commented 1 year ago

Thanks for your code, I wonder how the EPCL pertained model is obtained? For example, training datasets and training approach? Since the name of checkpoint includes scannet, was it trained on ScanNet datasets?

wangjiongw commented 1 year ago

Hi, thanks for your issue.

The EPCL checkpoint we used is the methodology from FrozenCLIP. Since the 3D datsets are limited, we follow the setting in the paper and choose the pretrained checkpoint on ScanNet, which is trained for 3D detection task.

ZCMax commented 1 year ago

Thanks for your reply, my next question is that since the pertained checkpoint is trained for 3D detection task on ScanNet, whether the 3D benchmark on ScanNet can still be regarded as zero-shot manner?

wangjiongw commented 1 year ago

Thanks. This method is limited by the existing pretrained encoder in 3D vision. Compared with 2D, the EPCL encoder indeed used the scannet data to pretrain. But in LAMM framework, ScanNet data is not exposured to LLM decoder, which is the major part of the framework.

Later, we will try to test with other point cloud encoder, contributions are also welcomed.

wangjiongw commented 1 year ago

This issue will be closed for no further discussions. Please reopen it if necessary.