Loading model ...
Found 3 unique KN Linear values.
Warming up autotune cache ...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:51<00:00, 4.27s/it]
Found 1 unique fused mlp KN values.
Warming up autotune cache ...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:24<00:00, 2.03s/it]
Done.
Traceback (most recent call last):
File "/data/Chinese-Vicuna/tools/quant_generate.py", line 207, in
main()
File "/data/Chinese-Vicuna/tools/quant_generate.py", line 123, in main
tokenizer = LlamaTokenizer.from_pretrained(args.model_path)
File "/home/nano/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1795, in from_pretrained
raise EnvironmentError(
OSError: Can't load tokenizer for './llama-hf/llama-7b/'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure './llama-hf/llama-7b/' is the correct path to a directory containing all relevant files for a LlamaTokenizer tokenizer.
python quant_generate.py --model_path ./llama-hf/llama-7b/ --quant_path llama7b-4bit-128g.pt --wbits 4 --groupsize 128 --gradio
跑以上命令的时候报错了,不太明白哪里出问题了,求教。。。Loading model ... Found 3 unique KN Linear values. Warming up autotune cache ... 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:51<00:00, 4.27s/it] Found 1 unique fused mlp KN values. Warming up autotune cache ... 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:24<00:00, 2.03s/it] Done. Traceback (most recent call last): File "/data/Chinese-Vicuna/tools/quant_generate.py", line 207, in
main()
File "/data/Chinese-Vicuna/tools/quant_generate.py", line 123, in main
tokenizer = LlamaTokenizer.from_pretrained(args.model_path)
File "/home/nano/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1795, in from_pretrained
raise EnvironmentError(
OSError: Can't load tokenizer for './llama-hf/llama-7b/'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure './llama-hf/llama-7b/' is the correct path to a directory containing all relevant files for a LlamaTokenizer tokenizer.