Open Alireza3242 opened 1 week ago
I solved this problem with some changes for gemma 2 9B, but still we have problem with gemma 2 27B:
/usr/local/lib/python3.10/dist-packages/tensorrt_llm/quantization/quantize_by_modelopt.py
in MODEL_NAME_PATTERN_MAP add "Gemma2": "gemma2" before "Gemma": "gemma"
/usr/local/lib/python3.10/dist-packages/modelopt/torch/export/tensorrt_llm_utils.py
in MODEL_NAME_TO_HF_ARCH_MAP change "gemma2": "GemmaForCausalLM" to "gemma2": "Gemma2ForCausalLM"
But in gemma 2 27b, when we quantize with awq, i have another problem. In inference time, result.output_token_ids always equals to [[-1]]
@Alireza3242 about you solved this problem with some changes for gemma 2 9B. How did you change it?
@Superjomn When can this problem be solved? I am also stuck here.
@imilli 1- /usr/local/lib/python3.10/dist-packages/tensorrt_llm/quantization/quantize_by_modelopt.py in MODEL_NAME_PATTERN_MAP set "Gemma2": "gemma2" up of list 2- /usr/local/lib/python3.10/dist-packages/modelopt/torch/export/tensorrt_llm_utils.py in MODEL_NAME_TO_HF_ARCH_MAP set:"gemma2": "Gemma2ForCausalLM",
System Info
a100
Who can help?
@Tracin
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I tried to quantize a gemma 2 9B model with awq.
Expected behavior
quantize without error
actual behavior
additional notes
no additional notes