Open kevin3314 opened 7 months ago
Hi, kevin, i encountered the same problem, did you solve it?
This kind of transient issue has been popping up every since transformers 4.36 was released. Unfortunately, the code since transformers 4.36 is unfavorable to handle these issues around input arguments in general. For now, I am not sure how to solve these errors in general. One method would be to pip install transformers<4.36.0
and find an AutoAWQ version that works with it.
CC @younesbelkada
I had the same issue today, installing transformers 4.35.2 seems to have worked.
First of all, thank you for great work.
System info
autoawq==0.1.8
Details
While I tried to quantize GPT NeoX model, encountered the error below.
After digging into codes, it turned out that this line broke. GPT NeoX model transforms attention mask shaped (batch_size, seq_len) into (batch_size, 1, 1, seq_len) before reaching GPTNeoXLayer. On quantizing, input to GPTNeoXModel does not include attention_mask so transformation does not occur. After that, attention mask, whose shape is not compatible with GPTNeoXLayer, is appended in the above line, passed to it, and the error is raised.
I think that
prepare_inputs_for_generation
should be called before feeding to modules[0] since it is the same as in model.generate() in transformer.