What should I do if I want to use tensor_parallel for a GPTQ quantized model(Llama-2-7b-Chat-GPTQ for examlpe) to inference on 2 or more GPUs?
Currently, I am using AutoGPTQ to load the quantized model, and then use tp.tensor_parallel to make tensors distribute on diffenrence devices. But I am getting the following error: TypeError: cannot pickle 'module' object
What should I do if I want to use tensor_parallel for a GPTQ quantized model(Llama-2-7b-Chat-GPTQ for examlpe) to inference on 2 or more GPUs?
Currently, I am using AutoGPTQ to load the quantized model, and then use tp.tensor_parallel to make tensors distribute on diffenrence devices. But I am getting the following error: TypeError: cannot pickle 'module' object
Do you have any suggentions on this? Thanks.