I successfully ran demo/inference.py on the CPU, but it responds slowly. Due to limited memory on the 3090 GPU, I attempted to run it on two GPUs. However, I meet an error in Chat.answer(), indicating: "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!". Screenshot of the error:
And I also type the device map of the model:
I am unsure why this error occurs. I've tried to fix it all day. Any insights or solutions would be greatly appreciated.
I successfully ran
demo/inference.py
on the CPU, but it responds slowly. Due to limited memory on the 3090 GPU, I attempted to run it on two GPUs. However, I meet an error in Chat.answer(), indicating: "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!". Screenshot of the error: And I also type the device map of the model: I am unsure why this error occurs. I've tried to fix it all day. Any insights or solutions would be greatly appreciated.