Closed burger-pb closed 7 months ago
Running into the same issue trying to use AutoAWQ on Gemma 2b
Running into the same issue trying to use AutoAWQ on llama2
The latest version of transformers has modified the implementation of the causal mask, which may cause problems. Temporarily downgrade transformers to 4.38.* should work.
@TechxGenus thx. it's work
Fixed on main branch. Pushing v0.2.4 to pypi shortly. We will wait to update to a newer transformers version until they have fixed the issue or a new contribution that solves the issue.
I fixed the quantization issue, which you can see below. However, inference does not run with fused modules due to another breaking update.
cuda 12.1 autoawq-0.2.3+cu121 File "/root/anaconda3/envs/auto_awq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/root/anaconda3/envs/auto_awq/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1196, in forward outputs = self.model( File "/root/anaconda3/envs/auto_awq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/root/anaconda3/envs/auto_awq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/root/anaconda3/envs/auto_awq/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 990, in forward causal_mask = self._update_causal_mask(attention_mask, inputs_embeds, cache_position) File "/root/anaconda3/envs/auto_awq/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1067, in _update_causal_mask if hasattr(self.layers[0].self_attn, "past_key_value"): # static cache File "/root/anaconda3/envs/auto_awq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1695, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'Catcher' object has no attribute 'self_attn'