ValueError: We were not able to get the tokenizer using `AutoTokenizer.from_pretrained` with the string that you have passed /data/mlops/Qwen-7B-Chat. If you have a custom tokenizer, you can pass it as input. For now, we only support quantization for text model. Support for vision, speech and multimodel will come later. #287
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|??????????????????????????????????????????????????????????????????????????????????| 8/8 [00:08<00:00, 1.00s/it]
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/optimum/gptq/quantizer.py", line 291, in quantize_model
tokenizer = AutoTokenizer.from_pretrained(tokenizer)
File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/tokenization_auto.py", line 784, in from_pretrained
raise ValueError(
ValueError: Tokenizer class QWenTokenizer does not exist or is not currently imported.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "finetune.py", line 365, in
train()
File "finetune.py", line 298, in train
model = transformers.AutoModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 3768, in from_pretrained
quantizer.quantize_model(model, quantization_config.tokenizer)
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/optimum/gptq/quantizer.py", line 293, in quantize_model
raise ValueError(
ValueError: We were not able to get the tokenizer using AutoTokenizer.from_pretrained
with the string that you have passed /data/mlops/Qwen-7B-Chat. If you have a custom tokenizer, you can pass it as input.
For now, we only support quantization for text model. Support for vision, speech and multimodel will come later.
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention Loading checkpoint shards: 100%|??????????????????????????????????????????????????????????????????????????????????| 8/8 [00:08<00:00, 1.00s/it] Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/optimum/gptq/quantizer.py", line 291, in quantize_model tokenizer = AutoTokenizer.from_pretrained(tokenizer) File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/tokenization_auto.py", line 784, in from_pretrained raise ValueError( ValueError: Tokenizer class QWenTokenizer does not exist or is not currently imported.
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "finetune.py", line 365, in
train()
File "finetune.py", line 298, in train
model = transformers.AutoModelForCausalLM.from_pretrained(
File "/usr/local/lib/python3.8/dist-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.8/dist-packages/transformers/modeling_utils.py", line 3768, in from_pretrained
quantizer.quantize_model(model, quantization_config.tokenizer)
File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/optimum/gptq/quantizer.py", line 293, in quantize_model
raise ValueError(
ValueError: We were not able to get the tokenizer using
AutoTokenizer.from_pretrained
with the string that you have passed /data/mlops/Qwen-7B-Chat. If you have a custom tokenizer, you can pass it as input. For now, we only support quantization for text model. Support for vision, speech and multimodel will come later.