Closed dainel40911 closed 4 months ago
For datasets other than Objaverse, I recommend fine-tuning the model for the best results. So you just need to collect some point-cloud and text pairs to fine-tune the model including the point-cloud encoder. But this is not necessary and you can directly use point cloud to generate texts. About 60G GPU memory would be needed for inferecning.
For inferencing with your own point clouds, you can refer to https://github.com/OpenRobotLab/PointLLM/blob/c0c1384379ff2933d5de2bacdfa7d750f5c726ee/pointllm/eval/chat_gradi.py
Basically, you only need to prepare your point clouds with dimensions B, N, 6 and a text, which should be formatted with the conv
template https://github.com/OpenRobotLab/PointLLM/blob/c0c1384379ff2933d5de2bacdfa7d750f5c726ee/pointllm/eval/chat_gradio.py#L201
Then feed to the model: https://github.com/OpenRobotLab/PointLLM/blob/c0c1384379ff2933d5de2bacdfa7d750f5c726ee/pointllm/eval/chat_gradio.py#L222.
Hi, Is there anything I can help with?
I really want to utilize your model to generate text description of my own dataset; However, I don't have enough GPU power to fine-tune or inference. Anyways, I still appreciate your thorough reply
There is a workaround. You can load the LLM backbone with int8 data format, so the required GPU memory will be greatly reduced. But I haven't tried that myself and don't know the model performance in that case. You may refer to https://huggingface.co/docs/transformers/quantization
Hello,
Thanks for the great work.
Is there any instruction or guidance for me to generate text description for my own dataset(part of the ShapeNet). Besides, how many gpu memory will I need for this task.
Thanks, Daniel Wu