OpenRobotLab / PointLLM

[ECCV 2024] PointLLM: Empowering Large Language Models to Understand Point Clouds
https://runsenxu.com/projects/PointLLM
450 stars 22 forks source link

About own dataset #24

Closed dainel40911 closed 4 months ago

dainel40911 commented 4 months ago

Hello,

Thanks for the great work.

Is there any instruction or guidance for me to generate text description for my own dataset(part of the ShapeNet). Besides, how many gpu memory will I need for this task.

Thanks, Daniel Wu

RunsenXu commented 4 months ago

For datasets other than Objaverse, I recommend fine-tuning the model for the best results. So you just need to collect some point-cloud and text pairs to fine-tune the model including the point-cloud encoder. But this is not necessary and you can directly use point cloud to generate texts. About 60G GPU memory would be needed for inferecning.

RunsenXu commented 4 months ago

For inferencing with your own point clouds, you can refer to https://github.com/OpenRobotLab/PointLLM/blob/c0c1384379ff2933d5de2bacdfa7d750f5c726ee/pointllm/eval/chat_gradi.py Basically, you only need to prepare your point clouds with dimensions B, N, 6 and a text, which should be formatted with the conv template https://github.com/OpenRobotLab/PointLLM/blob/c0c1384379ff2933d5de2bacdfa7d750f5c726ee/pointllm/eval/chat_gradio.py#L201

Then feed to the model: https://github.com/OpenRobotLab/PointLLM/blob/c0c1384379ff2933d5de2bacdfa7d750f5c726ee/pointllm/eval/chat_gradio.py#L222.

RunsenXu commented 4 months ago

Hi, Is there anything I can help with?

dainel40911 commented 4 months ago

I really want to utilize your model to generate text description of my own dataset; However, I don't have enough GPU power to fine-tune or inference. Anyways, I still appreciate your thorough reply

RunsenXu commented 4 months ago

There is a workaround. You can load the LLM backbone with int8 data format, so the required GPU memory will be greatly reduced. But I haven't tried that myself and don't know the model performance in that case. You may refer to https://huggingface.co/docs/transformers/quantization