BlackSamorez / tensor_parallel

Automatically split your PyTorch models on multiple GPUs for training & inference
MIT License
629 stars 39 forks source link

Can I use tensor_parallel to inference for a GPTQ quantized model? #131

Open minlik opened 1 year ago

minlik commented 1 year ago

What should I do if I want to use tensor_parallel for a GPTQ quantized model(Llama-2-7b-Chat-GPTQ for examlpe) to inference on 2 or more GPUs?

Currently, I am using AutoGPTQ to load the quantized model, and then use tp.tensor_parallel to make tensors distribute on diffenrence devices. But I am getting the following error: TypeError: cannot pickle 'module' object

Do you have any suggentions on this? Thanks.