ValueError: We were not able to get the tokenizer using `AutoTokenizer.from_pretrained` with the string that you have passed /data/mlops/Qwen-7B-Chat. If you have a custom tokenizer, you can pass it as input. For now, we only support quantization for text model. Support for vision, speech and multimodel will come later.

Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Loading checkpoint shards: 100%|??????????????????????????????????????????????????????????????????????????????????| 8/8 [00:08<00:00, 1.00s/it] Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/optimum/gptq/quantizer.py", line 291, in quantize_model tokenizer = AutoTokenizer.from_pretrained(tokenizer) File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/tokenization_auto.py", line 784, in from_pretrained raise ValueError( ValueError: Tokenizer class QWenTokenizer does not exist or is not currently imported.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "finetune.py", line 365, in train() File "finetune.py", line 298, in train model = transformers.AutoModelForCausalLM.from_pretrained( File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained return model_class.from_pretrained( File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 3768, in from_pretrained quantizer.quantize_model(model, quantization_config.tokenizer) File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/optimum/gptq/quantizer.py", line 293, in quantize_model raise ValueError( ValueError: We were not able to get the tokenizer using AutoTokenizer.from_pretrained with the string that you have passed /data/mlops/Qwen-7B-Chat. If you have a custom tokenizer, you can pass it as input. For now, we only support quantization for text model. Support for vision, speech and multimodel will come later.

LlamaFamily / Llama-Chinese