casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
https://casper-hansen.github.io/AutoAWQ/
MIT License
1.74k stars 208 forks source link

AttributeError: 'Catcher' object has no attribute 'self_attn' #407

Closed burger-pb closed 7 months ago

burger-pb commented 7 months ago

cuda 12.1 autoawq-0.2.3+cu121 File "/root/anaconda3/envs/auto_awq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, kwargs) File "/root/anaconda3/envs/auto_awq/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1196, in forward outputs = self.model( File "/root/anaconda3/envs/auto_awq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/root/anaconda3/envs/auto_awq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/root/anaconda3/envs/auto_awq/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 990, in forward causal_mask = self._update_causal_mask(attention_mask, inputs_embeds, cache_position) File "/root/anaconda3/envs/auto_awq/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1067, in _update_causal_mask if hasattr(self.layers[0].self_attn, "past_key_value"): # static cache File "/root/anaconda3/envs/auto_awq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1695, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'Catcher' object has no attribute 'self_attn'

patleeman commented 7 months ago

Running into the same issue trying to use AutoAWQ on Gemma 2b

amazingkmy commented 7 months ago

Running into the same issue trying to use AutoAWQ on llama2

TechxGenus commented 7 months ago

The latest version of transformers has modified the implementation of the causal mask, which may cause problems. Temporarily downgrade transformers to 4.38.* should work.

amazingkmy commented 7 months ago

@TechxGenus thx. it's work

casper-hansen commented 7 months ago

Fixed on main branch. Pushing v0.2.4 to pypi shortly. We will wait to update to a newer transformers version until they have fixed the issue or a new contribution that solves the issue.

I fixed the quantization issue, which you can see below. However, inference does not run with fused modules due to another breaking update.

https://github.com/casper-hansen/AutoAWQ/tree/fix_quant