Failed to load 4bit-128g WizardLM 7B

lee-b commented 1 year ago

Not sure if this is meant to work at present, but I got a RuntimeError: Internal: src/sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())] loading https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GPTQ/, after cloning it and doing ln -s https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GPTQ/blob/main/WizardLM-7B-uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors 4bit-128g.compat.no-act-order.safetensors.

This is with:

$ git show HEAD
commit 0a7113f99780abb15e9a058a7a8501767e54940a (HEAD -> latestgptq, origin/latestgptq, origin/HEAD)
Merge: b8e8b0f 530e204
Author: 0cc4m <picard12@live.de>
Date:   Wed May 24 06:32:54 2023 +0200

    Merge pull request #35 from YellowRoseCx/patch-1

    Update README.md to GPTQ-KoboldAI 0.0.5

$ git remote -v
origin  https://github.com/0cc4m/KoboldAI (fetch)
origin  https://github.com/0cc4m/KoboldAI (push)

Colab Check: False, TPU: False
INFO       | __main__:general_startup:1312 - Running on Repo: https://github.com/0cc4m/KoboldAI Branch: latestgptq
INIT       | Starting   | Flask
INIT       | OK         | Flask
INIT       | Starting   | Webserver
INIT       | Starting   | LUA bridge
INIT       | OK         | LUA bridge
INIT       | Starting   | LUA Scripts
INIT       | OK         | LUA Scripts
Setting Seed
INIT       | OK         | Webserver
MESSAGE    | Webserver started! You may now connect with a browser at http://127.0.0.1:5000
Connection Attempt: 127.0.0.1
INFO       | __main__:do_connect:2805 - Client connected! UI_1
Connection Attempt: 127.0.0.1
INFO       | __main__:do_connect:2805 - Client connected! UI_1
ERROR      | koboldai_settings:__setattr__:1210 - __setattr__ just set model_selected to NeoCustom in koboldai_vars. That variable isn't defined!
INFO       | __main__:get_model_info:1513 - Selected: NeoCustom, /home/lb/GIT/KoboldAI/models/TheBloke_WizardLM-7B-uncensored-GPTQ
INIT       | Searching  | GPU support
INIT       | Found      | GPU support
INIT       | Starting   | Transformers
INIT       | Info       | Final device configuration:
       DEVICE ID  |  LAYERS  |  DEVICE NAME
   (primary)   0  |      32  |  NVIDIA GeForce RTX 3090
               1  |       0  |  Tesla P40
               2  |       0  |  Tesla P40
             N/A  |       0  |  (Disk cache)
             N/A  |       0  |  (CPU)
INFO       | modeling.inference_models.hf_torch_4bit:_get_model:371 - Using GPTQ file: /home/lb/GIT/KoboldAI/models/TheBloke_WizardLM-7B-uncensored-GPTQ/4bit-128g.safetensors, 4-bit model, type llama, version 2, groupsize 128
Loading model ...
Done.
Exception in thread Thread-18:
Traceback (most recent call last):
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/socketio/server.py", line 731, in _handle_event_internal
    r = server._trigger_event(data[0], namespace, sid, *data[1:])
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/socketio/server.py", line 756, in _trigger_event
    return self.handlers[namespace][event](*args)
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/flask_socketio/__init__.py", line 282, in _handler
    return self._handle_event(handler, message, namespace, sid,
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/flask_socketio/__init__.py", line 828, in _handle_event
    ret = handler(*args)
  File "aiserver.py", line 615, in g
    return f(*a, **k)
  File "aiserver.py", line 3191, in get_message
    load_model(use_gpu=msg['use_gpu'], gpu_layers=msg['gpu_layers'], disk_layers=msg['disk_layers'], online_model=msg['online_model'])
  File "aiserver.py", line 1980, in load_model
    model.load(
  File "/home/lb/GIT/KoboldAI/modeling/inference_model.py", line 177, in load
    self._load(save_model=save_model, initial_load=initial_load)
  File "/home/lb/GIT/KoboldAI/modeling/inference_models/hf_torch_4bit.py", line 199, in _load
    self.tokenizer = self._get_tokenizer(self.get_local_model_path())
  File "/home/lb/GIT/KoboldAI/modeling/inference_models/hf_torch_4bit.py", line 391, in _get_tokenizer
    tokenizer = LlamaTokenizer.from_pretrained(utils.koboldai_vars.custmodpth)
  File "aiserver.py", line 112, in new_pretrainedtokenizerbase_from_pretrained
    tokenizer = old_pretrainedtokenizerbase_from_pretrained(cls, *args, **kwargs)
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1811, in from_pretrained
    return cls._from_pretrained(
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 1965, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/transformers/models/llama/tokenization_llama.py", line 96, in __init__
    self.sp_model.Load(vocab_file)
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/sentencepiece/__init__.py", line 905, in Load
    return self.LoadFromFile(model_file)
  File "/home/lb/GIT/KoboldAI/runtime/envs/koboldai/lib/python3.8/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1102) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

lee-b commented 1 year ago

Hmm. Also fails on TheBloke_vicuna-7B-1.1-GPTQ-4bit-128g, with the same error. Yet, the same download/loading process works fine on other 4bit 128g safetensor gptq models, like MetalX_GPT4-X-Alpasta-30b-4bit. Am I missing something here?

lee-b commented 1 year ago

Ah, I think this is the same as #11 , but not convinced that the issue is model-side as that ticket says? Maybe something in the loading needs to parse the model config.json differently, for instance?

lee-b commented 1 year ago

OK, so at least for TheBloke_vicuna-7B-1.1-GPTQ-4bit-128g, it only works if I download both the .pt and the .safetensors. I think it should try the .safetensors first, and only look for a .pt if .safetensors aren't available, right?

0cc4m / KoboldAI

Failed to load 4bit-128g WizardLM 7B #36