Keeps re-downloading entire model every new session

ChrisBNECBT commented 9 months ago

Hi,

I have a problem with the program which keeps re-downloading the model for every new session. Does anyone knows a fix for this?

(ps. I'm not a programmer please be gentle with the explanation)

The model was downloaded to "C:\Users[user].cache\huggingface\hub" by default.

Realizing that the program re-downloads for every other new session, I decided to copy the entire folder for the model "models--TheBloke--WizardLM-13B-V1.2-GPTQ" into "C:\localGPT\models".

Now that I have 2 copies of the model; one in "C:\Users[user].cache\huggingface\hub" and one in "C:\localGPT\models", the program still re-download the entire model all over again at every new session.

Below is the output of the process.

C:\localGPT>python run_localGPT.py 2023-09-30 13:42:05,682 - INFO - run_localGPT.py:221 - Running on: cuda 2023-09-30 13:42:05,684 - INFO - run_localGPT.py:222 - Display Source Documents set to: False 2023-09-30 13:42:05,685 - INFO - run_localGPT.py:223 - Use history set to: False 2023-09-30 13:42:07,535 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large load INSTRUCTOR_Transformer max_seq_length 512 2023-09-30 13:42:09,528 - INFO - posthog.py:16 - Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information. 2023-09-30 13:42:09,831 - INFO - run_localGPT.py:56 - Loading Model: TheBloke/WizardLM-13B-V1.2-GPTQ, on: cuda 2023-09-30 13:42:09,833 - INFO - run_localGPT.py:57 - This action can take a few minutes! 2023-09-30 13:42:09,833 - INFO - load_models.py:86 - Using AutoGPTQForCausalLM for quantized models Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████| 746/746 [00:00<?, ?B/s] C:\Users[user]\AppData\Local\Programs\Python\Python310\lib\site-packages\huggingface_hub\file_download.py:133: UserWarning: huggingface_hub cache-system uses symlinks by default to efficiently store duplicated files but your machine does not support them in C:\Users[user].cache\huggingface\hub. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the HF_HUB_DISABLE_SYMLINKS_WARNING environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations. To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development warnings.warn(message) Downloading tokenizer.model: 100%|██████████████████████████████████████████████████| 500k/500k [00:00<00:00, 1.11MB/s] Downloading (…)/main/tokenizer.json: 100%|████████████████████████████████████████| 1.84M/1.84M [00:01<00:00, 1.65MB/s] Downloading (…)in/added_tokens.json: 100%|██████████████████████████████████████████████████| 21.0/21.0 [00:00<?, ?B/s] Downloading (…)cial_tokens_map.json: 100%|██████████████████████████████████████████████████| 96.0/96.0 [00:00<?, ?B/s] 2023-09-30 13:42:16,699 - INFO - load_models.py:93 - Tokenizer loaded Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████| 937/937 [00:00<?, ?B/s] Downloading (…)quantize_config.json: 100%|████████████████████████████████████████████████████| 187/187 [00:00<?, ?B/s] Downloading model.safetensors: 1%|▍ | 73.4M/7.26G [00:10<15:54, 7.52MB/s] Aborted! Downloading model.safetensors: 1%|▍ | 73.4M/7.26G [00:10<17:37, 6.80MB/s]

PromtEngineer commented 9 months ago

@ChrisBNECBT I haven't encountered it before. Are you creating new virtual env every time or its the same virtual env?

bluciano212 commented 9 months ago

Hi! I have same issue . I already have the model downloaded on my SSD drive , but every time I launch: python3 run_localGPT.py --device_type cpu it start to download the model from the bloke . I modified the constants.py with MODEL_ID = "TheBloke/Llama-2-7B-Chat-GGML" MODEL_BASENAME = "llama-2-7b-chat.ggmlv3.q4_1.bin" copied the llama-2-7b-chat.ggmlv3.q4_1.bin on localGPT/models/models--TheBloke--Llama-2-7B-Chat-GGML but it does not see it and start to download every time the model . I do not have any virtual env...I have armbian 23.8.3 (ubuntu light) Screenshot (7)

hyperp0ppy commented 8 months ago

I'm having the same problem too.

susufff commented 2 months ago

too

spicedreams commented 1 month ago

Me too. I have built the docker image (once) and then run it several times, Each time I run the image it downloads tokenizer_config.json tokenizer.json special_tokens_map.json config.json (which warns "Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']" model.safetensors These seem to be cached between runs of the model (ie "python run_localGPT.py") but not between runs of the image/container

models/models--unsloth--llama-3-8b-bnb-4bit/snapshots/90bd376f78b5ba9ad646082570a99a33801c99ef/model.safetensors -> ../../blobs/10f8e0f74de9ebbc8529d80f3685a5a7beef1242c7eaa7600e978fd08f983db7 and models/models--unsloth--llama-3-8b-bnb-4bit/snapshots/90bd376f78b5ba9ad646082570a99a33801c99ef/../../blobs/10f8e0f74de9ebbc8529d80f3685a5a7beef1242c7eaa7600e978fd08f983db7 is -rw-r--r-- 1 root root 5702746405 2024-06-17 06:19

PromtEngineer / localGPT

Keeps re-downloading entire model every new session #544