AmazonBedrockChatGenerator Mistral models cause credentials issue

chrisk314 commented 4 months ago

Describe the bug When attempting to use Mistral models with the AmazonBedrockChatGenerator a 401 unauthenticated error response is returned from huggingface.co as shown below.

None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Traceback (most recent call last):
  File ".venv/lib/python3.12/site-packages/huggingface_hub/utils/_errors.py", line 304, in hf_raise_for_status
    response.raise_for_status()
  File ".venv/lib/python3.12/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1/resolve/main/config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File ".venv/lib/python3.12/site-packages/transformers/utils/hub.py", line 398, in cached_file
    resolved_file = hf_hub_download(
                    ^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
    return _hf_hub_download_to_cache_dir(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 1325, in _hf_hub_download_to_cache_dir
    _raise_on_head_call_error(head_call_error, force_download, local_files_only)
  File ".venv/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 1823, in _raise_on_head_call_error
    raise head_call_error
  File ".venv/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
    metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
    r = _request_wrapper(
        ^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
    response = _request_wrapper(
               ^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 396, in _request_wrapper
    hf_raise_for_status(response)
  File ".venv/lib/python3.12/site-packages/huggingface_hub/utils/_errors.py", line 321, in hf_raise_for_status
    raise GatedRepoError(message, response) from e
huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-66422d91-3d0d1e061df73c882b083b6c;5ebfab22-7062-4399-884b-0ab042bfd874)

Cannot access gated repo for url https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1/resolve/main/config.json.
Access to model mistralai/Mistral-7B-Instruct-v0.1 is restricted. You must be authenticated to access it.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "scripts/bedrock_mistral_error.py", line 5, in <module>
    AmazonBedrockChatGenerator(model="mistral.mistral-large-2402-v1:0")
  File ".venv/lib/python3.12/site-packages/haystack/core/component/component.py", line 176, in __call__
    instance = super().__call__(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/haystack_integrations/components/generators/amazon_bedrock/chat/chat_generator.py", line 113, in __init__
    self.model_adapter = model_adapter_cls(generation_kwargs or {})
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/haystack_integrations/components/generators/amazon_bedrock/chat/adapters.py", line 335, in __init__
    tokenizer: PreTrainedTokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 819, in from_pretrained
    config = AutoConfig.from_pretrained(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/transformers/models/auto/configuration_auto.py", line 928, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/transformers/configuration_utils.py", line 631, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/transformers/configuration_utils.py", line 686, in _get_config_dict
    resolved_config_file = cached_file(
                           ^^^^^^^^^^^^
  File ".venv/lib/python3.12/site-packages/transformers/utils/hub.py", line 416, in cached_file
    raise EnvironmentError(
OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1.
401 Client Error. (Request ID: Root=1-66422d91-3d0d1e061df73c882b083b6c;5ebfab22-7062-4399-884b-0ab042bfd874)

Cannot access gated repo for url https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1/resolve/main/config.json.
Access to model mistralai/Mistral-7B-Instruct-v0.1 is restricted. You must be authenticated to access it.

To Reproduce Set the required AWS credentials env vars in the environment then attempt to instantiate the AmazonBedrockChatGenerator using a Mistral model.

from haystack_integrations.components.generators.amazon_bedrock import AmazonBedrockChatGenerator

AmazonBedrockChatGenerator(model="mistral.mistral-large-2402-v1:0")

Describe your environment (please complete the following information):

OS: Ubuntu 22.04 in WSL2 on Windows 11
Haystack version: 2.1.1
Integration version: 0.7.1

lbux commented 4 months ago

I'm not sure why it's needed for Bedrock, but the error is related to accessing a HF model without a token. You need to sign up for a hugging face account and agree to the terms of a specific model.

Try:

pip install --upgrade huggingface_hub

huggingface-cli login --token [token from HF]

and see if you can run the generator again.

chrisk314 commented 4 months ago

Thanks for the suggestion @lbux, I will try that out to see if it fixes the issue; however, it doesn't seem like a workable solution as the AmazonBedrockChatGenerator is being executed by a service running in AWS ECS. Doesn't seem correct to use HF credentials for a personal account in that scenario.

lbux commented 4 months ago

AutoTokenizer is used to ensure the model doesn't exceed its prompt length and the tokenizer hosted on HF is used for that. From my experience with using Bedrock, there isn't any support for something similar natively. The issue is that Mistral decided that users must agree to their terms on HF before having access to the model. So, unless there is a way to do tokenization without using HF, a valid HF token will be needed.

It's not ideal in your case. I believe you can bypass the need for the HF token by downloading the model and passing in the model location to from_pretrained() in chat/adapters.py, but that is even less than ideal as the file is quite large and any updates to the library AmazonBedrock component may break the "patch".

chrisk314 commented 4 months ago

Just following up on this. I'm able to call Mistral models with Bedrock via LlamaIndex without creating a Huggingface account and setting Huggingface credentials. So this seems like a Haystack issue more than a Bedrock/Mistral issue.

import os
from llama_index.llms.bedrock import Bedrock

llm = Bedrock(
    model="mistral.mistral-large-2402-v1:0",
    profile_name=os.getenv("AWS_PROFILE"),
)
resp = llm.complete("What is the capital of France? Tell me a fun fact about French people.")
print(resp)

The capital of France is Paris. A fun fact about French people is that they consume around 300 types of cheese, making France the country with the largest variety of cheeses in the world

lbux commented 4 months ago

Just following up on this. I'm able to call Mistral models with Bedrock via LlamaIndex without creating a Huggingface account and setting Huggingface credentials. So this seems like a Haystack issue more than a Bedrock/Mistral issue.

Yes, I took a look at their code and they are not using a tokenizer to count the input. I don't think they're using anything to count the input either besides allowing you to set a context_size. Unless they have other mechanisms to check, then it is possible to exceed the context size and your input will either be truncated or rejected after making the call. Haystack takes an alternative approach and counts your input before sending it off and returns an error if it's too long. I don't think either solution is ideal.

Mistral does provide their tokenizer on github with 3 different tokenizer versions depending on what mistral model you use, but this would probably slow down the process and it still does require you to have the model saved somewhere.

vish9812 commented 4 months ago

it still does require you to have the model saved somewhere

It doesn't need to download a model. As per docs pip install mistral-common is all we need to generate the tokens and the properly tokenized text. In any case, it would still be better then using your own HF token in prod.

Mistral does provide their tokenizer on github with 3 different tokenizer versions

Based on the model's name, tokenizer version can be identified.

It should solve both the problems of downloading a model and using the HF token.

deepset-ai / haystack-core-integrations

AmazonBedrockChatGenerator Mistral models cause credentials issue #732