Future-House / paper-qa

High accuracy RAG for answering questions from scientific documents with citations
Apache License 2.0
6.44k stars 618 forks source link

How correctly setting up a SentenceTransformerEmbeddingModel ? #634

Open Snikch63200 opened 4 weeks ago

Snikch63200 commented 4 weeks ago

Hello,

I'm trying to setting up a local SentenceTransformerEmbeddingModel :

  sentence_transformer = SentenceTransformerEmbeddingModel(name='my-embedding-model',
                                                           config=dict(
                                                                       model=f"openai/my-embedding-model",
                                                                       api_base="http://192.168.1.15:8081/v1/",
                                                                       api_key="sk-no-key-required",
                                                                       )
                                                           )

I got this error :

No sentence-transformers model found with name sentence-transformers/mxbai-embed-large-v1-f16. Creating a new one with mean pooling.
Traceback (most recent call last):
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\huggingface_hub\utils\_http.py", line 406, in hf_raise_for_status
    response.raise_for_status()
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\requests\models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/sentence-transformers/mxbai-embed-large-v1-f16/resolve/main/config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\transformers\utils\hub.py", line 403, in cached_file
    resolved_file = hf_hub_download(
                    ^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\huggingface_hub\file_download.py", line 862, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\huggingface_hub\file_download.py", line 969, in _hf_hub_download_to_cache_dir
    _raise_on_head_call_error(head_call_error, force_download, local_files_only)
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\huggingface_hub\file_download.py", line 1484, in _raise_on_head_call_error
    raise head_call_error
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\huggingface_hub\file_download.py", line 1376, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(
               ^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\huggingface_hub\file_download.py", line 1296, in get_hf_file_metadata
    r = _request_wrapper(
        ^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\huggingface_hub\file_download.py", line 277, in _request_wrapper
    response = _request_wrapper(
               ^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\huggingface_hub\file_download.py", line 301, in _request_wrapper
    hf_raise_for_status(response)
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\huggingface_hub\utils\_http.py", line 454, in hf_raise_for_status
    raise _format(RepositoryNotFoundError, message, response) from e
huggingface_hub.errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-671a406c-6e2e67be14ca59e360c52dbb;55c67e24-a903-40a6-87bc-ba62f67b317f)

Repository Not Found for url: https://huggingface.co/sentence-transformers/mxbai-embed-large-v1-f16/resolve/main/config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\XXXXXXXXXXXXX\Desktop\PaperQA\paper_qa_question_async.py", line 119, in <module>
    asyncio.run(my_query())
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\asyncio\runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\asyncio\base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "C:\Users\XXXXXXXXXXXXX\Desktop\PaperQA\paper_qa_question_async.py", line 99, in my_query
    sentence_transformer = SentenceTransformerEmbeddingModel(name='mxbai-embed-large-v1-f16',
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\paperqa\llms.py", line 222, in __init__
    self._model = SentenceTransformer(self.name)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\sentence_transformers\SentenceTransformer.py", line 318, in __init__
    modules = self._load_auto_model(
              ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\sentence_transformers\SentenceTransformer.py", line 1522, in _load_auto_model
    transformer_model = Transformer(
                        ^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\sentence_transformers\models\Transformer.py", line 75, in __init__
    config = AutoConfig.from_pretrained(model_name_or_path, **config_args, cache_dir=cache_dir)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\transformers\models\auto\configuration_auto.py", line 1006, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\transformers\configuration_utils.py", line 570, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\transformers\configuration_utils.py", line 629, in _get_config_dict
    resolved_config_file = cached_file(
                           ^^^^^^^^^^^^
  File "C:\ProgramData\anaconda3\envs\PaperQ_env\Lib\site-packages\transformers\utils\hub.py", line 426, in cached_file
    raise EnvironmentError(
OSError: sentence-transformers/mxbai-embed-large-v1-f16 is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`

Hox to fix this ?

Thanks.

@Dosu.

dosubot[bot] commented 4 weeks ago

The error you're encountering indicates that the model name you're using, mxbai-embed-large-v1-f16, is not found on the Hugging Face Hub, or it might be a private model that requires authentication.

Here are some steps to resolve this issue:

  1. Verify Model Name: Double-check the model name to ensure it is correct and available on the Hugging Face Hub. You can search for the model on Hugging Face Models to confirm its existence.

  2. Authentication for Private Models: If the model is private, you need to authenticate with the Hugging Face Hub. You can do this by logging in using the huggingface-cli login command in your terminal, which will prompt you to enter your Hugging Face credentials. Alternatively, you can pass your Hugging Face token directly in your script:

    from transformers import AutoConfig
    
    config = AutoConfig.from_pretrained(
       "sentence-transformers/mxbai-embed-large-v1-f16",
       use_auth_token="your_huggingface_token"
    )
  3. Check API Key and Base URL: Ensure that the api_base and api_key parameters are correctly set if you are using a custom API. However, these parameters are not typically used for accessing models on the Hugging Face Hub, so you might not need them unless you have a specific setup.

  4. Local Model Path: If you have the model downloaded locally, ensure that the path is correct and accessible. You can specify the local path directly in the name parameter.

By following these steps, you should be able to resolve the issue and successfully load the model.

To continue talking to Dosu, mention @dosu.