I currently have 4 RTX 4090 GPUs, each with 24GB of memory. However, I encounter an out of memory error when running a 7B model. I changed model: PointLLMLlamaForCausalLM.from_pretrained(model_name, low_cpu_mem_usage=True, use_cache=True, torch_dtype=torch.float16).cuda()
to use dtype=float16, and it runs, but during inference, I get an error:
[ERROR] Input type (float) and bias type (c10::Half)
I am seeking your help, thank you.
I currently have 4 RTX 4090 GPUs, each with 24GB of memory. However, I encounter an out of memory error when running a 7B model. I changed model:
PointLLMLlamaForCausalLM.from_pretrained(model_name, low_cpu_mem_usage=True, use_cache=True, torch_dtype=torch.float16).cuda()
to use dtype=float16, and it runs, but during inference, I get an error: [ERROR] Input type (float) and bias type (c10::Half) I am seeking your help, thank you.