Closed wereretot closed 2 months ago
The tokenizer isn't a LlamaTokenizer, you need to specify --tokenizer (to the original 16-bit repo)
aphrodite run '/home/rexommendation/Programs/koboldcpp/model/GGUF/Mistral-Nemo-Instruct-2407' --tokenizer Mistral-Nemo-Instruct-2407
INFO: Initializing the Aphrodite Engine (v0.5.3) with the following config:
INFO: Model = '/home/rexommendation/Programs/koboldcpp/model/GGUF/Mistral-Nemo-Instruct-2407'
INFO: Speculative Config = None
INFO: DataType = torch.bfloat16
INFO: Model Load Format = auto
INFO: Number of GPUs = 1
INFO: Disable Custom All-Reduce = False
INFO: Quantization Format = None
INFO: Context Length = 1024000
INFO: Enforce Eager Mode = True
INFO: KV Cache Data Type = auto
INFO: KV Cache Params Path = None
INFO: Device = cuda
INFO: Guided Decoding Backend = DecodingConfig(guided_decoding_backend='outlines')
/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True
.
warnings.warn(
Traceback (most recent call last):
File "/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
response.raise_for_status()
File "/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/Mistral-Nemo-Instruct-2407/resolve/main/tokenizer_config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/site-packages/transformers/utils/hub.py", line 398, in cached_file resolved_file = hf_hub_download( ^^^^^^^^^^^^^^^^ File "/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py", line 101, in inner_f return f(*args, kwargs) ^^^^^^^^^^^^^^^^^^ File "/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^ File "/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1240, in hf_hub_download return _hf_hub_download_to_cache_dir( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1347, in _hf_hub_download_to_cache_dir _raise_on_head_call_error(head_call_error, force_download, local_files_only) File "/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1854, in _raise_on_head_call_error raise head_call_error File "/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1751, in _get_metadata_or_catch_error metadata = get_hf_file_metadata( ^^^^^^^^^^^^^^^^^^^^^ File "/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn return fn(args, kwargs) ^^^^^^^^^^^^^^^^^^^ File "/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 1673, in get_hf_file_metadata r = _request_wrapper( ^^^^^^^^^^^^^^^^^ File "/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 376, in _request_wrapper response = _request_wrapper( ^^^^^^^^^^^^^^^^^ File "/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/site-packages/huggingface_hub/file_download.py", line 400, in _request_wrapper hf_raise_for_status(response) File "/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/site-packages/huggingface_hub/utils/_errors.py", line 352, in hf_raise_for_status raise RepositoryNotFoundError(message, response) from e huggingface_hub.utils._errors.RepositoryNotFoundError: 404 Client Error. (Request ID: Root=1-669a6ab0-70356bef40d00edf5dc263b5;e46834bb-8d4a-4e76-8f8d-e3af6ce380b8)
Repository Not Found for url: https://huggingface.co/Mistral-Nemo-Instruct-2407/resolve/main/tokenizer_config.json.
Please make sure you specified the correct repo_id
and repo_type
.
If you are trying to access a private or gated repo, make sure you are authenticated.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/rexommendation/miniforge3/envs/aphrodite-engine/bin/aphrodite", line 8, in huggingface-cli login
or by passing token=<your_token>
When I use this command it gives me this error for each temp file
aphrodite run 'https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407' 1 ✘ 3s aphrodite-engine system
/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/site-packages/transformers/utils/hub.py:580: FutureWarning: Using from_pretrained
with the url of a file (here https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407) is deprecated and won't be possible anymore in v5 of Transformers. You should host your file on the Hub (hf.co) instead and use the repository ID. Note that this is not compatible with the caching system (your file will be downloaded at each execution) or multiple processes (each process will download the file in a different temporary file).
warnings.warn(
(…).co/mistralai/Mistral-Nemo-Instruct-2407: 100%|████████████████████████████████████████████████████████████████████████| 135k/135k [00:00<00:00, 989kB/s]
Traceback (most recent call last):
File "/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/site-packages/transformers/configuration_utils.py", line 716, in _get_config_dict
config_dict = cls._dict_from_json_file(resolved_config_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/site-packages/transformers/configuration_utils.py", line 815, in _dict_from_json_file
return json.loads(text)
^^^^^^^^^^^^^^^^
File "/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/json/init.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/rexommendation/miniforge3/envs/aphrodite-engine/lib/python3.11/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/rexommendation/miniforge3/envs/aphrodite-engine/bin/aphrodite", line 8, in
@AlpinDale I'm unable to run Nemo fp16 in aphrodite. this is the error I'm getting:
[rank0]: ValueError: Head size 160 is not supported by PagedAttention. Supported head sizes are: [64, 80, 96, 112, 128, 256].
(RayWorkerAphrodite pid=58123) ERROR: Error executing method load_model. This might cause deadlock in distributed execution.
I was able to run it with VLLM however. I think they were having the same issue and they fixed it here: https://github.com/vllm-project/vllm/pull/6548
Edit: ok I was able to run it with those changes in aphrodite. I can open a pull request for it.
Fixed as of v0.6.0
Your current environment
🐛 Describe the bug
mamba activate aphrodite-engine aphrodite run '/home/rexommendation/Programs/koboldcpp/model/GGUF/Mistral-Nemo-Instruct-2407-Q5_K_M.gguf'
It throws this error when loading