PromtEngineer / localGPT

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.
Apache License 2.0
19.79k stars 2.2k forks source link

OSError: Can't load tokenizer for 'TheBloke/Llama-2-7B-Chat-GGML'. #280

Open kai-breitbarth opened 1 year ago

kai-breitbarth commented 1 year ago

I didn't change anything in run_localGPT.py:

model_id="TheBloke/Llama-2-7B-Chat-GGML"
model_basename = "llama-2-7b-chat.ggmlv3.q4_0.bin"

I got everything installed until I tried to start python run_localGPT.py

The error message is:

File "/home/kai/anaconda3/envs/localGPT/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1825, in from_pretrained
    raise EnvironmentError(
OSError: Can't load tokenizer for 'TheBloke/Llama-2-7B-Chat-GGML'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'TheBloke/Llama-2-7B-Chat-GGML' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.

I have Cuda 11.7, AutoGPT installed successfully:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0

I found this in Huggingface: https://huggingface.co/TheBloke/LLaMa-7B-GGML/discussions/2 and also tried to install ctransformers with CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers

No clue what I'm doing wrong. Any help appreciated.

Thanks,

Kai

DeutscheGabanna commented 1 year ago

I had a similar problem to you. If you installed your dependencies through conda install --file requirements.txt then I'd suggest reinstalling the dependencies through: pip install -r requirements.txt and running python run_localGPT_API.py then.

PromtEngineer commented 1 year ago

@kai-breitbarth at the moment, GGML files are only supported on CPU/MPS. If you are running this on Nvidia GPU, look for GPTQ files for the models. There is a PR I am testing which will fix this issue.

AmirHoss96 commented 1 year ago

Pass device type as 'cpu' in line 228. It should fix the problem.

PromtEngineer commented 1 year ago

I just pushed a new update, that will fix it. Make sure you re-install llama-cpp if you are running it on GPU using the instructions provided in the readme for ggml models.

kai-breitbarth commented 1 year ago

Thanks a lot for the fast help! @DeutscheGabanna Moin! Until now I didn't try the API. The API runs with the Wizard model on GPU! So a first success! @PromtEngineer thanks a lot for the update! I rebuild everything regarding the readme. It works now. But for some weird reasons still using the CPU. I switched it to the wizard model an then it runs on GPU like expected. So no Llama2 but a running version.

kai-breitbarth commented 1 year ago

Ohhhhhhh. It's the difference between GGML and GPTQ... of course. I have to learn a lot...

helloworld53 commented 1 year ago

can someone pls help im still getting this errorr , i dont have GPU only CPU but keep getting [OSError: Can't load tokenizer for 'TheBloke/Llama-2-7B-Chat-GGML']

PromtEngineer commented 1 year ago

@helloworld4774 what base model name are you using?

helloworld53 commented 1 year ago

model_id="TheBloke/Llama-2-7B-Chat-GGML" model_basename = "llama-2-7b-chat.ggmlv3.q4_0.bin"

mr-nobody15 commented 1 year ago

install new conda environment and try to run to python code (after some cd...) without pip installing any libraries, worked for me for the model - vilsonrodrigues/falcon-7b-instruct-sharded.. Hope it works.

karthikcs commented 1 year ago

@PromtEngineer I have the same issue. I am running on Window 11 with CPU. Have the same error while loading Llama2 tokenizer

karthikcs commented 1 year ago

@PromtEngineer , I have a Pull Request to fix this.. Please review