Open ShelterWFF opened 3 months ago
reduce transformers verison
The default loading of the model in transformers
seems to have changed recently. For now, you can just use device_map
when needed.
Similar issue with following environments:
transfermers 4.42.4
AutoAWQ 0.2.6+cu118
AutoAWQ_Kernels 0.0.6+cu118
loading with device_map auto
model = AutoAWQForCausalLM.from_pretrained(config.model_path, device_map="auto", safetensors=True)
error solved by specifying device
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
But what if the model is larger than 80GB(e.g. qwen2-72b)?
convert meta-llama/Meta-Llama-3.1-70B-Instruct
transformers must be upgraded to 4.43.x. When I use 4.43.3, I get the same error.
@billvsme I'm using meta-llama/Meta-Llama-3.1-70B-Instruct
and i got the same error even i tried transformers==4.43.3 and 4.44.0. do i need to specify my entire env?
same issue @r4dm solution doesn't work for me as I m trying to quantize a llama3.1 fine-tuned model.
Unfortunately simply installing transformers==4.42.4 doesn't work for Llama3.1 as this reintroduces an issue with rope_scaling.
ValueError: rope_scaling
must be a dictionary with two fields, type
and factor
, got {'factor': 8.0, 'high_freq_factor': 4.0, 'low_freq_factor': 1.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3'}
Setting device_map="auto" in the model loading unfortunately doesn't work with latest transformers.
For anyone watching this, consider also tracking this issue in transformers: #32420
Same issue, but if you have enough vram or multi-gpu you can set device_map="auto" then it should work. CPU+GPU quantization for llama 3.1 is still broken as far as I know
I have a potential fix that may remedy both the "two devices" error and the rope_scaling
issue (by way of allowing for a newer transformers version). Feel free to try out the patch here:
https://github.com/davedgd/transformers/tree/patch-1
e.g.,
pip install git+https://github.com/davedgd/transformers@patch-1
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm)