PromtEngineer / localGPT

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.
Apache License 2.0
19.94k stars 2.22k forks source link

How to authenticate to huggingface.co, from the run_localGPT.py script, using Docker? #797

Closed dportabella closed 4 months ago

dportabella commented 4 months ago
$ docker run -it --mount src="$HOME/.cache",target=/root/.cache,type=bind  localgpt

==========
== CUDA ==
==========

CUDA Version 11.7.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
CUDA extension not installed.
CUDA extension not installed.
2024-05-16 23:04:49,632 - INFO - run_localGPT.py:244 - Running on: cuda
2024-05-16 23:04:49,632 - INFO - run_localGPT.py:245 - Display Source Documents set to: False
2024-05-16 23:04:49,632 - INFO - run_localGPT.py:246 - Use history set to: False
2024-05-16 23:04:49,874 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large
load INSTRUCTOR_Transformer
max_seq_length  512
2024-05-16 23:04:51,217 - INFO - run_localGPT.py:132 - Loaded embeddings from hkunlp/instructor-large
Here is the prompt used: input_variables=['context', 'question'] output_parser=None partial_variables={} template='<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a helpful assistant, you will use the provided context to answer user questions.\nRead the given context before answering questions and think step by step. If you can not answer a user question based on \nthe provided context, inform the user. Do not use any other information for answering user. Provide a detailed answer to the question.<|eot_id|><|start_header_id|>user<|end_header_id|>\n            Context: {context}\n            User: {question}<|start_header_id|>assistant<|end_header_id|>' template_format='f-string' validate_template=True
2024-05-16 23:04:51,262 - INFO - run_localGPT.py:60 - Loading Model: meta-llama/Meta-Llama-3-8B-Instruct, on: cuda
2024-05-16 23:04:51,263 - INFO - run_localGPT.py:61 - This action can take a few minutes!
2024-05-16 23:04:51,263 - INFO - load_models.py:153 - Using AutoModelForCausalLM for full models
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
    response.raise_for_status()
  File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 398, in cached_file
    resolved_file = hf_hub_download(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
    _raise_on_head_call_error(head_call_error, force_download, local_files_only)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1823, in _raise_on_head_call_error
    raise head_call_error
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
    r = _request_wrapper(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
    response = _request_wrapper(
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py", line 396, in _request_wrapper
    hf_raise_for_status(response)
  File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_errors.py", line 321, in hf_raise_for_status
    raise GatedRepoError(message, response) from e
huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-66469113-262d9507724414dd5cfb44fe;6fba667e-d2e6-408d-b6c7-9b2467267c52)

Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/config.json.
Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "//run_localGPT.py", line 285, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "//run_localGPT.py", line 252, in main
    qa = retrieval_qa_pipline(device_type, use_history, promptTemplate_type=model_type)
  File "//run_localGPT.py", line 142, in retrieval_qa_pipline
    llm = load_model(device_type, model_id=MODEL_ID, model_basename=MODEL_BASENAME, LOGGING=logging)
  File "//run_localGPT.py", line 74, in load_model
    model, tokenizer = load_full_model(model_id, model_basename, device_type, LOGGING)
  File "/load_models.py", line 154, in load_full_model
    tokenizer = AutoTokenizer.from_pretrained(model_id, cache_dir="./models/")
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/tokenization_auto.py", line 819, in from_pretrained
    config = AutoConfig.from_pretrained(
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 928, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 631, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/configuration_utils.py", line 686, in _get_config_dict
    resolved_config_file = cached_file(
  File "/usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py", line 416, in cached_file
    raise EnvironmentError(
OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct.
401 Client Error. (Request ID: Root=1-66469113-262d9507724414dd5cfb44fe;6fba667e-d2e6-408d-b6c7-9b2467267c52)

Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/resolve/main/config.json.
Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. You must be authenticated to access it.
dportabella commented 4 months ago

Register and ask a hugging face token here: https://huggingface.co/settings/tokens

export HUGGING_FACE_HUB_TOKEN=<your_token>
docker run -it --mount src="$HOME/.cache",target=/root/.cache,type=bind --gpus=all -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN localgpt
  Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted and you are not in the authorized list. Visit 
  https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct to ask for access.

Visit that url, ask for authorization on that model, and run again.