Closed chrisk314 closed 3 months ago
I'm not sure why it's needed for Bedrock, but the error is related to accessing a HF model without a token. You need to sign up for a hugging face account and agree to the terms of a specific model.
Try:
pip install --upgrade huggingface_hub
huggingface-cli login --token [token from HF]
and see if you can run the generator again.
Thanks for the suggestion @lbux, I will try that out to see if it fixes the issue; however, it doesn't seem like a workable solution as the AmazonBedrockChatGenerator
is being executed by a service running in AWS ECS. Doesn't seem correct to use HF credentials for a personal account in that scenario.
AutoTokenizer is used to ensure the model doesn't exceed its prompt length and the tokenizer hosted on HF is used for that. From my experience with using Bedrock, there isn't any support for something similar natively. The issue is that Mistral decided that users must agree to their terms on HF before having access to the model. So, unless there is a way to do tokenization without using HF, a valid HF token will be needed.
It's not ideal in your case. I believe you can bypass the need for the HF token by downloading the model and passing in the model location to from_pretrained() in chat/adapters.py, but that is even less than ideal as the file is quite large and any updates to the library AmazonBedrock component may break the "patch".
Just following up on this. I'm able to call Mistral models with Bedrock via LlamaIndex without creating a Huggingface account and setting Huggingface credentials. So this seems like a Haystack issue more than a Bedrock/Mistral issue.
import os
from llama_index.llms.bedrock import Bedrock
llm = Bedrock(
model="mistral.mistral-large-2402-v1:0",
profile_name=os.getenv("AWS_PROFILE"),
)
resp = llm.complete("What is the capital of France? Tell me a fun fact about French people.")
print(resp)
The capital of France is Paris. A fun fact about French people is that they consume around 300 types of cheese, making France the country with the largest variety of cheeses in the world
Just following up on this. I'm able to call Mistral models with Bedrock via LlamaIndex without creating a Huggingface account and setting Huggingface credentials. So this seems like a Haystack issue more than a Bedrock/Mistral issue.
Yes, I took a look at their code and they are not using a tokenizer to count the input. I don't think they're using anything to count the input either besides allowing you to set a context_size. Unless they have other mechanisms to check, then it is possible to exceed the context size and your input will either be truncated or rejected after making the call. Haystack takes an alternative approach and counts your input before sending it off and returns an error if it's too long. I don't think either solution is ideal.
Mistral does provide their tokenizer on github with 3 different tokenizer versions depending on what mistral model you use, but this would probably slow down the process and it still does require you to have the model saved somewhere.
it still does require you to have the model saved somewhere
It doesn't need to download a model. As per docs pip install mistral-common
is all we need to generate the tokens and the properly tokenized text. In any case, it would still be better then using your own HF token in prod.
Mistral does provide their tokenizer on github with 3 different tokenizer versions
Based on the model's name, tokenizer version can be identified.
It should solve both the problems of downloading a model and using the HF token.
Describe the bug When attempting to use Mistral models with the
AmazonBedrockChatGenerator
a 401 unauthenticated error response is returned from huggingface.co as shown below.To Reproduce Set the required AWS credentials env vars in the environment then attempt to instantiate the
AmazonBedrockChatGenerator
using a Mistral model.Describe your environment (please complete the following information):