OpenRobotLab / PointLLM

[ECCV 2024] PointLLM: Empowering Large Language Models to Understand Point Clouds
https://runsenxu.com/projects/PointLLM
450 stars 22 forks source link

Pre-trained Weights #1

Closed huangjy-pku closed 10 months ago

huangjy-pku commented 10 months ago

Hello, nice work! Do you have plans for releasing the pre-trained weights of the point cloud encoder Point-BERT? As you mentioned in the paper:

As the original implementation of ULIP-2 only supports point clouds with spatial coordinates (xyz), we re-train Point-BERT with color information (xyzrgb), following the same procedure outlined in the ULIP-2 paper.

I think the release would be of great help. Appreciate :)

RunsenXu commented 10 months ago

Yes, I will release them in one month.

RunsenXu commented 10 months ago

Dear Jiangyong,

I uploaded a checkpoint and its corresponding config at https://drive.google.com/drive/folders/1JB5W8EAaIimWodFoiBuLbkDd3fYOsMqG?usp=sharing .

Best, Runsen

KzZheng commented 3 months ago

Hi Runsen,

I'm curious why there are two checkpoints inside the shared folder. Also, model.point_input_dims is still 3; should it be 6 to include colors? I'm a little confused here.

Also, are these weights the final used weight for both PointLLM-7B and PointLLM-13B? If so, which one? 7573 or 39?

Looking forward to your reply!

Best, Kz

RunsenXu commented 3 months ago

Hi Kz,

If I remember correctly, the 7573 contains model weights of the image encoder and text encoder of OpenCLIP ViT-L/14, datacomp xl s13b b90. And the weights of point encoder are the same in 7573 and 39 (But I am sorry I am not sure as they were uploaded a long time ago).

As for the input dimension, the argument in the yaml file is ignored. I set the input dim when initializing the PointTransformer with codes like:

        self.point_input_dims = 3 if self.args.add_RGB is False else 6
        print(f'point_input_dims: {self.point_input_dims}. Args.add_RGB is {self.args.add_RGB}.')
        self.encoder = Encoder(encoder_channel=self.encoder_dims, point_input_dims=self.point_input_dims)

You can check the weight 'module.point_encoder.encoder.first_conv.0.weight', which is torch.Size([128, 6, 1]).

As for which weight is used for PointLLM-7B and PointLLM-13B, you should use the weights here https://huggingface.co/RunsenXu/PointLLM_13B_v1.1_init/blob/main/point_bert_v1.2.pt . (v1.1 is also available)

RunsenXu commented 3 months ago

After checking, yes the point encoder weights in 7573 and 39 are the same.

Additionally, it's worth noting that the checkpoint here is not the same one I used in PointLLM. When uploading, Jiangyong explicitly stated some other requirements to me via email, so the checkpoint here is different from the one in PointLLM (whether v1.1 or v1.2), although the performance should be relatively close. If you need it, please download the checkpoint used by PointLLM here https://huggingface.co/RunsenXu/PointLLM_13B_v1.1_init/blob/main/point_bert_v1.2.pt

To avoid confusion, I have canceled this share link.

KzZheng commented 3 months ago

Thanks for your fast reply!